Description
Description
How to Perform a Cloudscale Pre-Check for NetBackup on Elastic Kubernetes Service (EKS).
- This document outlines the set of commands required to validate the configuration of Elastic Kubernetes Service (EKS) clusters and their compatibility with Veritas NetBackup before deployment or upgrade.
- These checks ensure that all required resources are available and compatible with NetBackup components.
Step 1: Cluster and Client Validation
1. Check the Kubernetes Version:
- Command: kubectl version
- Ensures kubernetes version is compatible with NetBackup.
- Refer to - https://www.veritas.com/support/en_US/article.100040093
2. List All Nodes:
- Command: kubectl get nodes
- Confirms nodes are in Ready state and suitable for workload deployment.
Step 2: Check helm charts installed on system.
- Command: Helm ls -A
- For version 10.4, the deployment includes cert-manager, trust-manager, operators, and PostgreSQL Helm charts.
- Starting from version 10.5, fluentbit will also be included alongside the existing charts.
- Refer to:
Step 3: Namespace and Resource Validation
1. List All Namespaces:
- Command: kubectl get namespaces
- Identifies available namespaces including NetBackup and supporting components.
2. List All Resources in NetBackup Namespace:
- Command: kubectl get all -n <Netbackup Namespace>
3. List All Resources in NetBackup Operator Namespace:
- Command: kubectl get all -n <Netbackup Operator Namespace>
4. List All Resources with Node and IP Details:
- Command: kubectl get all -n <Namespace_Name> -o wide
- Provides extended resource info, aiding in node and IP mapping for diagnostics.
After executing the above commands, verify all the namespaces present in the environment.
- Additionally, confirm that all NetBackup-related namespaces and the Operator are in a healthy state.
- Examples namespaces include: Primary, Media, NSM, cert-manager, trust-manager, system, etc.
Step 4: Supporting Component Validation
1. Cert-Manager Pod and Deployment Validation:
- Command: kubectl get pods -n cert-manager -o wide
Verify cert-manager is installed and using a compatible image.
- Command: kubectl -n cert-manager get deployment cert-manager -o=jsonpath="{.spec.template.spec.containers[0].image}"
Note: Cert-manager version 1.13.3 is compatible with all version starting from Netbackup 10.3.0.1 Version
2. Trust-Manager Pod and Deployment Validation:
- Command: kubectl get pods -n trust-manager -o wide
- Ensures trust-manager is running and image version is compatible.
- Command: kubectl get deployment trust-manager -n <namespace> -o=jsonpath="{.spec.template.spec.containers[0].image}"
NOTE: trust-manager version 0.7.0 is compatible with all version starting from Netbackup 10.3.0.1 Version
Step 5: ECR (Elastic Container Registry) Image Validation
1. List All Repositories in ECR:
- Command: aws ecr describe-repositories --query 'repositories[*].repositoryName' --output text
2. List Tags for a Specific Image Repository:
- Command: aws ecr list-images --repository-name <repository-name> --query 'imageIds[*].imageTag' --output text
Verify the availability of required NetBackup container images and versions.
Step 6: NetBackup Custom Resources and Secrets
1. Export All Resources in YAML:
- Command: kubectl get all -n <Namespace_Name> -o yaml > environment_upgrade_check.yaml
- Saves current state for review or rollback purposes.
2. Extract Environment Custom Resources and validate them.
- Command: Kubectl get environment -n <netbackup-namespace> -o yaml > environment.yaml
While validating the existing configuration, ensure that no restricted keywords are used as per the guidelines. - https://www.veritas.com/content/support/en_US/article.100074300.html
- Additionally, when upgrading from NetBackup 10.4 to a later version, ensure that the environment.yaml file includes the dbsecret name.
3. List Bundle Custom Resources
- Command: kubectl get bundle -n <Namespace_Name>
4. List Secrets in NetBackup Namespace
- Command: kubectl get secrets -n <Namespace_Name>
Step 7: Validate DBSecret Password:
1. Validates access credentials are correctly configured.
- Command: kubectl get secret <DBSecret_Name> -n netbackup -o jsonpath='{.data.dbadminpassword}' | base64 --decode
Step 8: Validate Maximum Pods Allowed per Node in Each Node Pool
The following requirements apply from NetBackup 10.5 onward, as the Primary Server has been decoupled and Fluent Bit has been introduced:
- Primary Pool – m5.4xlarge
- Media Pool – m5.xlarge
- MSDP Pool – r5.2xlarge
- cpdatapool (Snapshot Manager) – t3.large
Step 9: Verify Total Size of Datamover Secret in Flexsnap-Certauth Pod Directories Does Not Exceed 3MB Before Upgrade
1. Identify the Pod
- List the pod name first:
- kubectl get pods -n <namespace> | grep flexsnap-certauth
2. Access the Pod Shell
- Replace <pod-name> with the actual pod name:
- kubectl exec -it <pod-name> -n <namespace> -- bash
3. Run:
- du -sb /cloudpoint/eca /cloudpoint/openv/var/vxss /cloudpoint/openv/var/webtruststore | awk '{sum+=$1} END {print "Total size:", sum, "bytes"; if (sum>3145728) print " Exceeds 3MB limit!"; else print " Within limit."}'
If within limit the output will show:
Total size: 114943 bytesWithin limit.
You can proceed with the upgrade.
4. If It exceeds 3MB the output shows:
Exceeds 3MB limit!
Proceed to locate large .crl and .log files in the same directories by running the following commands inside the pod:
a. Check .crl files
- find /cloudpoint/eca /cloudpoint/openv/var/vxss /cloudpoint/openv/var/webtruststore -type f -name "*.crl" -exec du -h {} + | sort -hr | head -20
b. Check .log files
- find /cloudpoint/eca /cloudpoint/openv/var/vxss /cloudpoint/openv/var/webtruststore -type f -name "*.log" -exec du -h {} + | sort -hr | head -20
These will list the largest 20 files (in descending order of size) so you can identify which ones are consuming most of the space.
5. Move those files outside of the pod before the upgrade or it might create issues with Snapshot manager upgrade.
- Note: Don't delete them before the upgrade is completed.
Step 10: Verify No Backup from Snapshot Jobs Are Running Before Upgrade
- Purpose: Ensure that no ongoing backup-from-snapshot (datamover) jobs are running during the upgrade.
- Running datamover jobs can interfere with the Snapshot Manager upgrade process and may lead to job or configuration inconsistencies.
1. Check for any active Datamover pods:
Run the following commands to identify if any datamover pods or jobs are active in the namespace:
- kubectl get pods -n <namespace>
- kubectl get jobs -n <namespace>
Look for pod names containing “datamover” or similar backup job identifiers.
2. If any Datamover pods are found:
Delete them to ensure the environment is clean before the upgrade:
- kubectl delete jobs <datamover-pod-name> -n <namespace>
⚠️ Ensure that no active backups are required before deletion.
- This step should only be performed during a planned upgrade window.
3. Re-verify:
Once deletion is complete, re-run:
- kubectl get pods -n <namespace>
Confirm that no datamover pods or jobs are listed.
Step 11: Verify all the Certificates are valid in flexsnap-certauth pod.
Run command: flexsnap-config certs
- Purpose: Verify that all certificates are valid and not expired before performing the upgrade.
- If any certificates are found to be expired, contact Cohesity Support for assistance.
Step 12: Make sure no service.tags are there in the environment before the upgrade.
This step must be performed before starting the actual upgrade:
Edit the environment using the following command:
- kubectl edit environment <env-name> -n <namespace>
Remove any service tags that have been added.
- Once the changes are saved, all pods associated with those service tags will automatically restart.
- After the pods are up, proceed with the upgrade process.
Step 13: Ensure the IAM role used by the Snapshot Manager has the eks:ListNodeGroups permission assigned.
Without this permission, Snapshot operations, particularly backups taken from snapshots, may fail or experience operational issues.