Important Update: Cohesity Products Documentation
All Cohesity product documentation are now managed via the Cohesity Docs Portal: https://docs.cohesity.com/HomePage/Content/home.htm. Some documentation available here may not reflect the latest information or may no longer be accessible.
Cohesity Cloud Scale Technology Manual Deployment Guide for Kubernetes Clusters
- Introduction
- Section I. Configurations
- Prerequisites
- Preparing the environment for NetBackup installation on Kubernetes cluster
- Prerequisites for Snapshot Manager (AKS/EKS)
- Prerequisites for Kubernetes cluster configuration
- Prerequisites for Cloud Scale configuration
- Prerequisites for deploying environment operators
- Prerequisites for using private registry
- Recommendations and Limitations
- Configurations
- Configuration of key parameters in Cloud Scale deployments
- Tuning touch files
- Setting maximum jobs per client
- Setting maximum jobs per media server
- Enabling intelligent catalog archiving
- Enabling security settings
- Configuring email server
- Reducing catalog storage management
- Configuring zone redundancy
- Enabling client-side deduplication capabilities
- Parameters for logging (fluentbit)
- Managing media server configurations in Web UI
- Prerequisites
- Section II. Deployment
- Section III. Monitoring and Management
- Monitoring NetBackup
- Monitoring Snapshot Manager
- Monitoring fluentbit
- Monitoring MSDP Scaleout
- Managing NetBackup
- Managing the Load Balancer service
- Managing PostrgreSQL DBaaS
- Managing logging
- Performing catalog backup and recovery
- Section IV. Maintenance
- PostgreSQL DBaaS Maintenance
- Patching mechanism for primary, media servers, fluentbit pods, and postgres pods
- Upgrading
- Cloud Scale Disaster Recovery
- Uninstalling
- Troubleshooting
- Troubleshooting AKS and EKS issues
- View the list of operator resources
- View the list of product resources
- View operator logs
- View primary logs
- Socket connection failure
- Resolving an issue where external IP address is not assigned to a NetBackup server's load balancer services
- Resolving the issue where the NetBackup server pod is not scheduled for long time
- Resolving an issue where the Storage class does not exist
- Resolving an issue where the primary server or media server deployment does not proceed
- Resolving an issue of failed probes
- Resolving issues when media server PVs are deleted
- Resolving an issue related to insufficient storage
- Resolving an issue related to invalid nodepool
- Resolve an issue related to KMS database
- Resolve an issue related to pulling an image from the container registry
- Resolving an issue related to recovery of data
- Check primary server status
- Pod status field shows as pending
- Ensure that the container is running the patched image
- Getting EEB information from an image, a running container, or persistent data
- Resolving the certificate error issue in NetBackup operator pod logs
- Pod restart failure due to liveness probe time-out
- NetBackup messaging queue broker take more time to start
- Host mapping conflict in NetBackup
- Issue with capacity licensing reporting which takes longer time
- Local connection is getting treated as insecure connection
- Backing up data from Primary server's /mnt/nbdata/ directory fails with primary server as a client
- Storage server not supporting Instant Access capability on Web UI after upgrading NetBackup
- Taint, Toleration, and Node affinity related issues in cpServer
- Operations performed on cpServer in environment.yaml file are not reflected
- Elastic media server related issues
- Failed to register Snapshot Manager with NetBackup
- Post Kubernetes cluster restart, flexsnap-listener pod went into CrashLoopBackoff state or pods were unable to connect to flexsnap-rabbitmq
- Post Kubernetes cluster restart, issues observed in case of containerized Postgres deployment
- Request router logs
- Issues with NBPEM/NBJM
- Issues with logging feature for Cloud Scale
- The flexsnap-listener pod is unable to communicate with RabbitMQ
- Job remains in queue for long time
- Extracting logs if the nbwsapp or log-viewer pods are down
- Helm installation failed with bundle error
- Deployment fails with private container registry and Postgres fails to pull the images
- Troubleshooting AKS-specific issues
- Troubleshooting EKS-specific issues
- Resolving the primary server connection issue
- NetBackup Snapshot Manager deployment on EKS fails
- Wrong EFS ID is provided in cloudscale-values.yaml file
- Primary pod is in ContainerCreating state
- Webhook displays an error for PV not found
- Cluster Autoscaler initialization issue
- Catalog backup job fails with an error (Status 9202)
- Troubleshooting issue for bootstrapper pod
- Troubleshooting issues for kubectl plugin
- Troubleshooting AKS and EKS issues
- Appendix A. CR template
- Appendix B. MSDP Scaleout
- About MSDP Scaleout
- Prerequisites for MSDP Scaleout (AKS\EKS)
- Limitations in MSDP Scaleout
- MSDP Scaleout configuration
- Installing the docker images and binaries for MSDP Scaleout (without environment operators or Helm charts)
- Deploying MSDP Scaleout
- Managing MSDP Scaleout
- MSDP Scaleout maintenance
Deploying the operators
To perform these steps, log on to the Linux workstation or VM where you have extracted the TAR file.
To deploy the operators
- Use the following command to save the operators chart values to a file:
helm show values operators-<version>.tgz operators-values.yaml
- Use the following command to edit the chart values to fit your requirement:
vi operators-values.yaml
- Execute the following command to deploy the operators:
helm upgrade --install operators operators-<version>.tgz -f operators-values.yaml --create-namespace --namespace netbackup-operator-system
Or
If using the OCI container registry, use the following command:
helm upgrade --install operators oci://abcd.veritas.com:5000/helm-charts/operators --version <version> -f operators-values.yaml --create-namespace --namespace netbackup-operator-system
Following is the output of the above command:
$ helm show values operators-11.1.x.x.xxxx.tgz > operators-values.yaml $ $ vi operators-values.yaml $ $ helm upgrade --install operators operators-11.1.x.x.xxxx.tgz \ > -f operators-values.yaml \ > --create-namespace \ > --namespace netbackup-operator-system Release "operators" does not exist. Installing it now. NAME: operators LAST DEPLOYED: Tue Feb 27 00:01:29 2024 NAMESPACE: netbackup-operator-system STATUS: deployed REVISION: 1 TEST SUITE: None
Following is an example for
operators-values.yamlfile which includes the parameter for private registry support.flexsnap-operator: image: name: veritas/flexsnap-deploy pullPolicy: Always tag: 11.1.0.0-1001-ar1 namespace: labels: {} nodeSelector: node_selector_key: agentpool node_selector_value: nbuxpool replicas: 1 global: containerRegistry: name.azurecr.io operatorNamespace: netbackup-operator-system platform: aks storage: aks: storageAccountName: null storageAccountRG: null eks: fileSystemId: fs-id timezone: null msdp-operator: corePattern: /core/core.%e.%p.%t image: name: msdp-operator pullPolicy: Always tag: 21.1-0002-ar1 logging: age: 28 debug: false num: 20 namespace: labels: control-plane: controller-manager nodeSelector: {} replicas: 2 resources: limits: {} requests: cpu: 150m memory: 150Mi storageClass: name: nb-disk-premium size: 5Gi nb-operator: flexsnap-operator: image: tag: 11.1.0.0-1001-ar1 image: name: netbackup/operator pullPolicy: Always tag: 11.1-0004-ar1 loglevel: value: "0" msdp-operator: image: tag: 21.1-0002-ar1 namespace: labels: nb-control-plane: nb-controller-manager nodeSelector: node_selector_key: agentpool node_selector_value: nbuxpool