NetBackup™ Deployment Guide for Kubernetes Clusters
- Introduction
- Section I. Configurations
- Prerequisites
- Recommendations and Limitations
- Configurations
- Configuration of key parameters in Cloud Scale deployments
- Tuning touch files
- Setting maximum jobs per client
- Setting maximum jobs per media server
- Enabling intelligent catalog archiving
- Enabling security settings
- Configuring email server
- Reducing catalog storage management
- Configuring zone redundancy
- Enabling client-side deduplication capabilities
- Parameters for logging (fluentbit)
- Section II. Deployment
- Section III. Monitoring and Management
- Monitoring NetBackup
- Monitoring Snapshot Manager
- Monitoring fluentbit
- Monitoring MSDP Scaleout
- Managing NetBackup
- Managing the Load Balancer service
- Managing PostrgreSQL DBaaS
- Managing fluentbit
- Performing catalog backup and recovery
- Section IV. Maintenance
- PostgreSQL DBaaS Maintenance
- Patching mechanism for primary, media servers, fluentbit pods, and postgres pods
- Upgrading
- Cloud Scale Disaster Recovery
- Uninstalling
- Troubleshooting
- Troubleshooting AKS and EKS issues
- View the list of operator resources
- View the list of product resources
- View operator logs
- View primary logs
- Socket connection failure
- Resolving an issue where external IP address is not assigned to a NetBackup server's load balancer services
- Resolving the issue where the NetBackup server pod is not scheduled for long time
- Resolving an issue where the Storage class does not exist
- Resolving an issue where the primary server or media server deployment does not proceed
- Resolving an issue of failed probes
- Resolving token issues
- Resolving an issue related to insufficient storage
- Resolving an issue related to invalid nodepool
- Resolving a token expiry issue
- Resolve an issue related to KMS database
- Resolve an issue related to pulling an image from the container registry
- Resolving an issue related to recovery of data
- Check primary server status
- Pod status field shows as pending
- Ensure that the container is running the patched image
- Getting EEB information from an image, a running container, or persistent data
- Resolving the certificate error issue in NetBackup operator pod logs
- Pod restart failure due to liveness probe time-out
- NetBackup messaging queue broker take more time to start
- Host mapping conflict in NetBackup
- Issue with capacity licensing reporting which takes longer time
- Local connection is getting treated as insecure connection
- Primary pod is in pending state for a long duration
- Backing up data from Primary server's /mnt/nbdata/ directory fails with primary server as a client
- Storage server not supporting Instant Access capability on Web UI after upgrading NetBackup
- Taint, Toleration, and Node affinity related issues in cpServer
- Operations performed on cpServer in environment.yaml file are not reflected
- Elastic media server related issues
- Failed to register Snapshot Manager with NetBackup
- Post Kubernetes cluster restart, flexsnap-listener pod went into CrashLoopBackoff state or pods were unable to connect to flexsnap-rabbitmq
- Post Kubernetes cluster restart, issues observed in case of containerized Postgres deployment
- Request router logs
- Issues with NBPEM/NBJM
- Issues with logging feature for Cloud Scale
- The flexsnap-listener pod is unable to communicate with RabbitMQ
- Troubleshooting AKS-specific issues
- Troubleshooting EKS-specific issues
- Troubleshooting issue for bootstrapper pod
- Troubleshooting AKS and EKS issues
- Appendix A. CR template
- Appendix B. MSDP Scaleout
- About MSDP Scaleout
- Prerequisites for MSDP Scaleout (AKS\EKS)
- Limitations in MSDP Scaleout
- MSDP Scaleout configuration
- Installing the docker images and binaries for MSDP Scaleout (without environment operators or Helm charts)
- Deploying MSDP Scaleout
- Managing MSDP Scaleout
- MSDP Scaleout maintenance
Upgrade the operators
Depending on the following scenarios, perform the appropriate procedure to upgrade the operators:
Upgrade only
Upgrade and modify additional parameters
Note:
The helm command must be run from the following location:
/VRTSk8s-netbackup-<version>/helm/
Upgrade the operators using the following command when using the new tags and not modifying additional parameters:
Note:
Use this command if operators are deployed using separate operator helm chart. Separate operator helm charts are supported from version 10.4 onwards.
helm upgrade operators operators-<version>.tgz -n netbackup-operator-system --reuse-values \ --set msdp-operator.image.tag=20.5-0045 \ --set nb-operator.image.tag=<version>\ --set nb-operator.msdp-operator.image.tag=20.5-0045 \ --set nb-operator.flexsnap-operator.image.tag=<version> \ --set flexsnap-operator.image.tag=<version>
Perform the following steps to upgrade the operators when modifying the parameters in addition to the tags:
Extract the
operators-values.yamlfile from the helm package.Use the following command to save the operators chart values to a file:
helm show values operators-<version>.tgz > operators-values.yaml
Use the following command to obtain the values from the current helm release (to be used as reference):
helm get values operators -n netbackup-operator-system
Use the following command to edit the chart values to match your deployment scenario:
vi operators-values.yaml
Following is an example for
operators-values.yamlfile:# Default values for operators. # This is a YAML-formatted file. # Declare variables to be passed into your templates. global: # Toggle for platform-specific features & settings # Microsoft AKS: "aks" # Amazon EKS: "eks" platform: "eks" # This specifies a container registry that the cluster has access to. # NetBackup images should be pushed to this registry prior to applying this # Environment resource. # Example Azure Container Registry name: # example.azurecr.io # Example AWS Elastic Container Registry name: # 123456789012.dkr.ecr.us-east-1.amazonaws.com containerRegistry: "364956537575.dkr.ecr.us-east-1.amazonaws.com/engdev" operatorNamespace: "netbackup-operator-system" # By default pods will get spun up in timezone of node, timezone of node is UTC in AKS/EKS # through this field one can specify the different timezone # example : /usr/share/zoneinfo/Asia/Kolkata timezone: null storage: eks: fileSystemId: fs-0411809d90c60aed6 aks: #storageAccountName and storageAccountRG required if use wants to use existing storage account storageAccountName: null storageAccountRG: null msdp-operator: image: name: msdp-operator # Provide tag value in quotes eg: '17.0' tag: "20.5-0027" pullPolicy: Always namespace: labels: control-plane: controller-manager # This determines the path used for storing core files in the case of a crash. corePattern: "/core/core.%e.%p.%t" # This specifies the number of replicas of the msdp-operator controllers # to create. Minimum number of supported replicas is 1. replicas: 2 # Optional: provide label selectors to dictate pod scheduling on nodes. # By default, when given an empty {} all nodes will be equally eligible. # Labels should be given as key-value pairs, ex: # agentpool: mypoolname nodeSelector: agentpool: nbupool # Storage specification to be used by underlying persistent volumes. # References entries in global.storage by default, but can be replaced storageClass: name: nb-disk-premium size: 5Gi # Specify how much of each resource a container needs. resources: # Requests are used to decide which node(s) should be scheduled for pods. # Pods may use more resources than specified with requests. requests: cpu: 150m memory: 150Mi # Optional: Limits can be implemented to control the maximum utilization by pods. # The runtime prevents the container from using more than the configured resource limits. limits: {} logging: # Enable verbose logging debug: false # Maximum age (in days) to retain log files, 1 <= N <= 365 age: 28 # Maximum number of log files to retain, 1 <= N =< 20 num: 20 nb-operator: image: name: "netbackup/operator" tag: "10.5-xxxx" pullPolicy: Always # nb-operator needs to know the version of msdp and flexsnap operators for webhook # to do version checking msdp-operator: image: tag: "20.5-0027" flexsnap-operator: image: tag: "10.5.x.x-xxxx" namespace: labels: nb-control-plane: nb-controller-manager nodeSelector: node_selector_key: agentpool node_selector_value: nbupool #loglevel: # "-1" - Debug (not recommended for production) # "0" - Info # "1" - Warn # "2" - Error loglevel: value: "0" flexsnap-operator: replicas: 1 namespace: labels: {} image: name: "veritas/flexsnap-deploy" tag: "10.5.x.x-xxxx" pullPolicy: Always nodeSelector: node_selector_key: agentpool node_selector_value: nbupoolExecute the following command to upgrade the operators:
helm upgrade --install operators operators-<version>.tgz -f operators-values.yaml -n netbackup-operator-system
Or
If using the OCI container registry, use the following command:
helm upgrade --install operators oci://abcd.veritas.com:5000/helm-charts/operators --version <version> -f operators-values.yaml -n netbackup-operator-system
Use the following command to verify the operator pod status:
kubectl get all -n netbackup-operator-system