Cohesity Cloud Scale Technology Manual Deployment Guide for Kubernetes Clusters
- Introduction
- Section I. Configurations
- Prerequisites
- Preparing the environment for NetBackup installation on Kubernetes cluster
- Prerequisites for Snapshot Manager (AKS/EKS)
- Prerequisites for Kubernetes cluster configuration
- Prerequisites for Cloud Scale configuration
- Prerequisites for deploying environment operators
- Prerequisites for using private registry
- Recommendations and Limitations
- Configurations
- Configuration of key parameters in Cloud Scale deployments
- Tuning touch files
- Setting maximum jobs per client
- Setting maximum jobs per media server
- Enabling intelligent catalog archiving
- Enabling security settings
- Configuring email server
- Reducing catalog storage management
- Configuring zone redundancy
- Enabling client-side deduplication capabilities
- Parameters for logging (fluentbit)
- Managing media server configurations in Web UI
- Prerequisites
- Section II. Deployment
- Section III. Monitoring and Management
- Monitoring NetBackup
- Monitoring Snapshot Manager
- Monitoring fluentbit
- Monitoring MSDP Scaleout
- Managing NetBackup
- Managing the Load Balancer service
- Managing PostrgreSQL DBaaS
- Managing logging
- Performing catalog backup and recovery
- Section IV. Maintenance
- PostgreSQL DBaaS Maintenance
- Patching mechanism for primary, media servers, fluentbit pods, and postgres pods
- Upgrading
- Cloud Scale Disaster Recovery
- Uninstalling
- Troubleshooting
- Troubleshooting AKS and EKS issues
- View the list of operator resources
- View the list of product resources
- View operator logs
- View primary logs
- Socket connection failure
- Resolving an issue where external IP address is not assigned to a NetBackup server's load balancer services
- Resolving the issue where the NetBackup server pod is not scheduled for long time
- Resolving an issue where the Storage class does not exist
- Resolving an issue where the primary server or media server deployment does not proceed
- Resolving an issue of failed probes
- Resolving issues when media server PVs are deleted
- Resolving an issue related to insufficient storage
- Resolving an issue related to invalid nodepool
- Resolve an issue related to KMS database
- Resolve an issue related to pulling an image from the container registry
- Resolving an issue related to recovery of data
- Check primary server status
- Pod status field shows as pending
- Ensure that the container is running the patched image
- Getting EEB information from an image, a running container, or persistent data
- Resolving the certificate error issue in NetBackup operator pod logs
- Pod restart failure due to liveness probe time-out
- NetBackup messaging queue broker take more time to start
- Host mapping conflict in NetBackup
- Issue with capacity licensing reporting which takes longer time
- Local connection is getting treated as insecure connection
- Backing up data from Primary server's /mnt/nbdata/ directory fails with primary server as a client
- Storage server not supporting Instant Access capability on Web UI after upgrading NetBackup
- Taint, Toleration, and Node affinity related issues in cpServer
- Operations performed on cpServer in cloudscale-values.yaml file are not reflected
- Elastic media server related issues
- Failed to register Snapshot Manager with NetBackup
- Post Kubernetes cluster restart, flexsnap-listener pod went into CrashLoopBackoff state or pods were unable to connect to flexsnap-rabbitmq
- Post Kubernetes cluster restart, issues observed in case of containerized Postgres deployment
- Request router logs
- Issues with NBPEM/NBJM
- Issues with logging feature for Cloud Scale
- The flexsnap-listener pod is unable to communicate with RabbitMQ
- Job remains in queue for long time
- Extracting logs if the nbwsapp or log-viewer pods are down
- Helm installation failed with bundle error
- Deployment fails with private container registry and Postgres fails to pull the images
- Troubleshooting AKS-specific issues
- Troubleshooting EKS-specific issues
- Resolving the primary server connection issue
- NetBackup Snapshot Manager deployment on EKS fails
- Wrong EFS ID is provided in cloudscale-values.yaml file
- Primary pod is in ContainerCreating state
- Webhook displays an error for PV not found
- Cluster Autoscaler initialization issue
- Catalog backup job fails with an error (Status 9202)
- Troubleshooting issue for bootstrapper pod
- Troubleshooting issues for kubectl plugin
- Troubleshooting AKS and EKS issues
- Appendix A. CR template
- Appendix B. MSDP Scaleout
- About MSDP Scaleout
- Prerequisites for MSDP Scaleout (AKS\EKS)
- Limitations in MSDP Scaleout
- MSDP Scaleout configuration
- Installing the docker images and binaries for MSDP Scaleout (without environment operators or Helm charts)
- Deploying MSDP Scaleout
- Managing MSDP Scaleout
- MSDP Scaleout maintenance
Prerequisites for Cloud Scale deployment
Ensure that the following steps are performed before deploying the operators:
- Install cert-manager by using the following command:
helm repo add jetstack https://charts.jetstack.io
helm repo update jetstack
helm upgrade -i -n cert-manager cert-manager jetstack/cert-manager \ --version 1.18.2 \ --set webhook.timeoutSeconds=30 \ --set installCRDs=true \ --wait --create-namespace
For details, see cert-manager Documentation.
- Create NetBackup namespace by using the following command:
kubectl create ns netbackup
- Install trust-manager by using the following command:
kubectl create namespace trust-manager
helm upgrade -i -n trust-manager trust-manager jetstack/trust-manager --set app.trust.namespace=netbackup --version v0.19.0 --wait
For details, see trust-manager Documentation.
- (If using private registry) Create kubernetes secret for using the private registries as follows:
Create operator namespace by using the following command:
kubectl create ns netbackup-operator-system
kubectl create secret docker-registry <secret-name> \ --namespace <namespace> \ --docker-server=<container-registry-name> \ --docker-username=<service-principal-ID> \ --docker-password=<service-principal-password>
For example (AKS): kubectl create secret docker-registry demo-secret --namespace netbackup-operator-system --docker-server=cpautomation.azurecr.io --docker-username=c1d03169-6f35-4c29-b527-995bbad3b608 --docker-password=<password_here>
- (For Cloud Scale) Create kubernetes secret for using the private registries as follows:
kubectl create secret docker-registry <secret-name> \ --namespace <namespace> \ --docker-server=<container-registry-name> \ --docker-username=<service-principal-ID> \ --docker-password=<service-principal-password>
For example (AKS): kubectl create secret docker-registry demo-secret --namespace netbackup --docker-server=cpautomation.azurecr.io --docker-username=c1d03169-6f35-4c29-b527-995bbad3b608 --docker-password=<password_here>
This section provides step-by-step instructions for deploying and in an air-gapped or partially air-gapped Azure Kubernetes Service (AKS) environment. It enables installation without direct internet access, using JFrog Artifactory as the container registry (adaptable for ACR or ECR). The tools are installed in separate namespaces ( and ) for compatibility, with trust-manager configured to manage trust bundles in the NetBackup namespace.
Prerequisites
Internet-connected and air-gapped environments
Secure file transfer method between environments
Access to a private container registry (for example, JFrog Artifactory)
Docker/Podman and Helm installed on both environments
kubectlconfigured for the AKS clusterRegistry credentials
Air-Gapped installation procedure
- Gather Helm charts locally (Internet-connected environment):
Download the cert-manager and trust-manager charts:
# Add the Jetstack Helm repository helm repo add jetstack https://charts.jetstack.io --force-update # Update your local Helm repository indexes helm repo update # Pull the cert-manager chart helm pull jetstack/cert-manager --version v1.17.0 --untar --untardir ./cert-manager-chart # Pull the trust-manager chart helm pull jetstack/trust-manager --untar --untardir ./trust-manager-chart --replace
This creates folders
./cert-manager-chartand./trust-manager-chartcontaining the Helm chart files.Identify the container images in each chart:
Method 1: Generate and examine the manifests:
helm template ./cert-manager-chart/cert-manager --namespace cert-manager > cert-manager-manifest.yaml helm template ./trust-manager-chart/trust-manager --namespace trust-manager > trust-manager-manifest.yaml # Find all images used in the manifests grep "image:" cert-manager-manifest.yaml grep "image:" trust-manager-manifest.yaml
Example output:
# cert-manager images image: "quay.io/jetstack/cert-manager-cainjector:v1.17.0" image: "quay.io/jetstack/cert-manager-controller:v1.17.0" image: "quay.io/jetstack/cert-manager-webhook:v1.17.0" image: "quay.io/jetstack/cert-manager-startupapicheck:v1.17.0" # trust-manager images image: "quay.io/jetstack/trust-pkg-debian-bookworm:20230311.0" image: "quay.io/jetstack/trust-manager:v0.16.0"
Method 2: Examine the
values.yamlfiles directly:For cert-manager: Look for
.image.repository,.image.tag, and so on.For trust-manager: Look for
app.image.repository,app.image.tag, and so on.The default images are typically pulled from
quay.io/jetstack/.
- Pull and save the images (Internet-connected environment):
# Pull cert-manager images docker pull quay.io/jetstack/cert-manager-controller:v1.17.0 docker pull quay.io/jetstack/cert-manager-webhook:v1.17.0 docker pull quay.io/jetstack/cert-manager-cainjector:v1.17.0 docker pull quay.io/jetstack/cert-manager-startupapicheck:v1.17.0 # Pull trust-manager images docker pull quay.io/jetstack/trust-manager:v0.16.0 docker pull quay.io/jetstack/trust-pkg-debian-bookworm:20230311.0 # Save all images into a single tarball docker save \ quay.io/jetstack/cert-manager-controller:v1.17.0 \ quay.io/jetstack/cert-manager-webhook:v1.17.0 \ quay.io/jetstack/cert-manager-cainjector:v1.17.0 \ quay.io/jetstack/cert-manager-startupapicheck:v1.17.0 \ quay.io/jetstack/trust-manager:v0.16.0 \ quay.io/jetstack/trust-pkg-debian-bookworm:20230311.0 \ -o cert-manager-trust-manager-images.tar
- Copy the files to the offline/restricted host:
Transfer the image tarball:
Transfer
cert-manager-trust-manager-images.tarfile to your air-gapped environment using an approved method (for example, SCP, secure file transfer, or physical media).Transfer the Helm charts:
Transfer the Helm charts to the air-gapped environment using one of the following approaches:
Option 1: Pack and transfer the extracted chart directories:
# Create archive of the chart directories tar -czf helm-charts.tar.gz ./cert-manager-chart ./trust-manager-chart # Transfer helm-charts.tar.gz to the air-gapped environment # Then on the air-gapped host: tar -xzf helm-charts.tar.gz
Option 2: Create Helm package archives and transfer those:
# Package the charts helm package ./cert-manager-chart/cert-manager -d ./packages helm package ./trust-manager-chart/trust-manager -d ./packages # Transfer the ./packages directory to the air-gapped environment
Use the same secure file transfer method as used for the container images.
- Load and push images into your private registry (Air-Gapped environment):
# Load the images into Docker docker load -i cert-manager-trust-manager-images.tar # Login to your private Artifactory registry # You'll be prompted to enter your password securely # If using ACR/ECR replace the docker login command below with (Ex: az acr login -n myacr) docker login my-artifactory.mycompany.com -u <your-username> # Tag the images for your private registry (replace my-artifactory.mycompany.com with your registry URL) docker tag quay.io/jetstack/cert-manager-controller:v1.17.0 \ my-artifactory.mycompany.com/jetstack/cert-manager-controller:v1.17.0 docker tag quay.io/jetstack/cert-manager-webhook:v1.17.0 \ my-artifactory.mycompany.com/jetstack/cert-manager-webhook:v1.17.0 docker tag quay.io/jetstack/cert-manager-cainjector:v1.17.0 \ my-artifactory.mycompany.com/jetstack/cert-manager-cainjector:v1.17.0 docker tag quay.io/jetstack/cert-manager-startupapicheck:v1.17.0 \ my-artifactory.mycompany.com/jetstack/cert-manager-startupapicheck:v1.17.0 docker tag quay.io/jetstack/trust-manager:v0.16.0 \ my-artifactory.mycompany.com/jetstack/trust-manager:v0.16.0 docker tag quay.io/jetstack/trust-pkg-debian-bookworm:20230311.0 \ my-artifactory.mycompany.com/jetstack/trust-pkg-debian-bookworm:20230311.0 # Push the images to your private registry docker push my-artifactory.mycompany.com/jetstack/cert-manager-controller:v1.17.0 docker push my-artifactory.mycompany.com/jetstack/cert-manager-webhook:v1.17.0 docker push my-artifactory.mycompany.com/jetstack/cert-manager-cainjector:v1.17.0 docker push my-artifactory.mycompany.com/jetstack/cert-manager-startupapicheck:v1.17.0 docker push my-artifactory.mycompany.com/jetstack/trust-manager:v0.16.0 docker push my-artifactory.mycompany.com/jetstack/trust-pkg-debian-bookworm:20230311.0
- Create image pull secret and update Helm charts:
Create a kubernetes secret for registry authentication:
# Create the cert-manager namespace if it doesn't exist yet kubectl create namespace cert-manager --dry-run=client -o yaml | kubectl apply -f - # Create a secret with your registry credentials # Create secret for cert-manager kubectl create secret docker-registry artifactory-registry-secret \ --namespace cert-manager \ --docker-server=my-artifactory.mycompany.com \ --docker-username=<your-username> \ --docker-password=<your-password> \ --docker-email=<your-email> # Create the same secret for trust-manager kubectl create secret docker-registry artifactory-registry-secret \ --namespace trust-manager \ --docker-server=my-artifactory.mycompany.com \ --docker-username=<your-username> \ --docker-password=<your-password> \ --docker-email=<your-email>
Update the Helm charts to use your private registry using one of the following options:
Option 1: Modify
values.yamlfiles:For
cert-manager, update the following sections incert-manager-chart/cert-manager/values.yamlfile:image: repository: my-artifactory.mycompany.com/jetstack/cert-manager-controller tag: v1.17.0 # ... webhook: image: repository: my-artifactory.mycompany.com/jetstack/cert-manager-webhook tag: v1.17.0 # ... cainjector: image: repository: my-artifactory.mycompany.com/jetstack/cert-manager-cainjector tag: v1.17.0 # ... # Add imagePullSecrets imagePullSecrets: - name: artifactory-registry-secret For trust-manager, update the following in trust-manager-chart/trust-manager/values.yaml: app: image: repository: my-artifactory.mycompany.com/jetstack/trust-manager tag: v0.16.0 trust: namespace: netbackup # ... pkgImporter: image: repository: my-artifactory.mycompany.com/jetstack/trust-pkg-debian-bookworm tag: "20230311.0" # Add imagePullSecrets imagePullSecrets: - name: artifactory-registry-secretOption 2: Use
--setparameters during installation:Instead of modifying the
values.yamlfiles directly, you can use--setparameters during installation.For more information, see the commented lines in the step 6 below.
- Connect to and install on your AKS cluster (Air-Gapped environment):
Authenticate to your AKS cluster:
Before installing Helm charts, ensure that your kubectl is properly configured to communicate with your AKS cluster:
# If you have Azure CLI in your air-gapped environment az login az account set --subscription <your-subscription-id> az aks get-credentials --resource-group <your-resource-group> --name <your-aks-cluster-name> # OR if you've transferred a kubeconfig file export KUBECONFIG=/path/to/your/kubeconfig # Verify connectivity kubectl cluster-info kubectl get nodes
Install
cert-manager:Based on how you transferred the Helm charts, use one of the following approaches:
# Create cert-manager namespace kubectl create namespace cert-manager
Option 1:
If you transferred the extracted chart directories helm install cert-manager ./cert-manager-chart/cert-manager \ --namespace cert-manager \ --set crds.enabled=true # If you didn't modify values.yaml, add these overrides: # --set image.repository=my-artifactory.mycompany.com/jetstack/cert-manager-controller \ # --set image.tag=v1.17.0 \ # --set webhook.image.repository=my-artifactory.mycompany.com/jetstack/cert-manager-webhook \ # --set webhook.image.tag=v1.17.0 \ # --set cainjector.image.repository=my-artifactory.mycompany.com/jetstack/cert-manager-cainjector \ # --set cainjector.image.tag=v1.17.0 \ # --set startupapicheck.image.repository=my-artifactory.mycompany.com/jetstack/cert-manager-startupapicheck \ # --set startupapicheck.image.tag=v1.17.0 \ # --set imagePullSecrets[0].name=artifactory-registry-secret
Option 2:
If you transferred packaged charts (.tgz files) helm install cert-manager ./packages/cert-manager-v1.17.0.tgz \ --namespace cert-manager \ --set crds.enabled=true # If you didn't modify values.yaml, add these overrides: # --set image.repository=my-artifactory.mycompany.com/jetstack/cert-manager-controller \ # --set image.tag=v1.17.0 \ # --set webhook.image.repository=my-artifactory.mycompany.com/jetstack/cert-manager-webhook \ # --set webhook.image.tag=v1.17.0 \ # --set cainjector.image.repository=my-artifactory.mycompany.com/jetstack/cert-manager-cainjector \ # --set cainjector.image.tag=v1.17.0 \ # --set startupapicheck.image.repository=my-artifactory.mycompany.com/jetstack/cert-manager-startupapicheck \ # --set startupapicheck.image.tag=v1.17.0 \ # --set imagePullSecrets[0].name=artifactory-registry-secret
Install
trust-manager:Based on how you transferred the Helm charts, use one of the following approaches:
# Create trust-manager and netbackup namespace kubectl create namespace trust-manager kubectl create namespace netbackup
Option 1:
If you transferred the extracted chart directories helm install trust-manager ./trust-manager-chart/trust-manager \ --namespace trust-manager \ --set app.trust.namespace=netbackup \ --wait # If you didn't modify values.yaml, add these overrides: # --set app.image.repository=my-artifactory.mycompany.com/jetstack/trust-manager \ # --set app.image.tag=v0.16.0 \ # --set pkgImporter.image.repository=my-artifactory.mycompany.com/jetstack/trust-pkg-debian-bookworm \ # --set pkgImporter.image.tag=20230311.0 \ # --set imagePullSecrets[0].name=artifactory-registry-secret
Option 2:
If you transferred packaged charts (.tgz files) helm install trust-manager ./packages/trust-manager-v0.16.0.tgz \ --namespace trust-manager \ --set app.trust.namespace=netbackup \ --wait # If you didn't modify values.yaml, add these overrides: # --set app.image.repository=my-artifactory.mycompany.com/jetstack/trust-manager \ # --set app.image.tag=v0.16.0 \ # --set pkgImporter.image.repository=my-artifactory.mycompany.com/jetstack/trust-pkg-debian-bookworm \ # --set pkgImporter.image.tag=20230311.0 \ # --set imagePullSecrets[0].name=artifactory-registry-secret
- Verify the installation:
# Check if cert-manager pods are running kubectl get pods -n cert-manager # Check if trust-manager pod is running kubectl get pods -n trust-manager -l app=trust-manager