NetBackup™ Deployment Guide for Kubernetes Clusters
- Introduction
- Section I. Configurations
- Prerequisites
- Recommendations and Limitations
- Configurations
- Configuration of key parameters in Cloud Scale deployments
- Tuning touch files
- Setting maximum jobs per client
- Setting maximum jobs per media server
- Enabling intelligent catalog archiving
- Enabling security settings
- Configuring email server
- Reducing catalog storage management
- Configuring zone redundancy
- Enabling client-side deduplication capabilities
- Parameters for logging (fluentbit)
- Section II. Deployment
- Section III. Monitoring and Management
- Monitoring NetBackup
- Monitoring Snapshot Manager
- Monitoring fluentbit
- Monitoring MSDP Scaleout
- Managing NetBackup
- Managing the Load Balancer service
- Managing PostrgreSQL DBaaS
- Managing fluentbit
- Performing catalog backup and recovery
- Section IV. Maintenance
- PostgreSQL DBaaS Maintenance
- Patching mechanism for primary, media servers, fluentbit pods, and postgres pods
- Upgrading
- Cloud Scale Disaster Recovery
- Uninstalling
- Troubleshooting
- Troubleshooting AKS and EKS issues
- View the list of operator resources
- View the list of product resources
- View operator logs
- View primary logs
- Socket connection failure
- Resolving an issue where external IP address is not assigned to a NetBackup server's load balancer services
- Resolving the issue where the NetBackup server pod is not scheduled for long time
- Resolving an issue where the Storage class does not exist
- Resolving an issue where the primary server or media server deployment does not proceed
- Resolving an issue of failed probes
- Resolving token issues
- Resolving an issue related to insufficient storage
- Resolving an issue related to invalid nodepool
- Resolving a token expiry issue
- Resolve an issue related to KMS database
- Resolve an issue related to pulling an image from the container registry
- Resolving an issue related to recovery of data
- Check primary server status
- Pod status field shows as pending
- Ensure that the container is running the patched image
- Getting EEB information from an image, a running container, or persistent data
- Resolving the certificate error issue in NetBackup operator pod logs
- Pod restart failure due to liveness probe time-out
- NetBackup messaging queue broker take more time to start
- Host mapping conflict in NetBackup
- Issue with capacity licensing reporting which takes longer time
- Local connection is getting treated as insecure connection
- Primary pod is in pending state for a long duration
- Backing up data from Primary server's /mnt/nbdata/ directory fails with primary server as a client
- Storage server not supporting Instant Access capability on Web UI after upgrading NetBackup
- Taint, Toleration, and Node affinity related issues in cpServer
- Operations performed on cpServer in environment.yaml file are not reflected
- Elastic media server related issues
- Failed to register Snapshot Manager with NetBackup
- Post Kubernetes cluster restart, flexsnap-listener pod went into CrashLoopBackoff state or pods were unable to connect to flexsnap-rabbitmq
- Post Kubernetes cluster restart, issues observed in case of containerized Postgres deployment
- Request router logs
- Issues with NBPEM/NBJM
- Issues with logging feature for Cloud Scale
- The flexsnap-listener pod is unable to communicate with RabbitMQ
- Troubleshooting AKS-specific issues
- Troubleshooting EKS-specific issues
- Troubleshooting issue for bootstrapper pod
- Troubleshooting AKS and EKS issues
- Appendix A. CR template
- Appendix B. MSDP Scaleout
- About MSDP Scaleout
- Prerequisites for MSDP Scaleout (AKS\EKS)
- Limitations in MSDP Scaleout
- MSDP Scaleout configuration
- Installing the docker images and binaries for MSDP Scaleout (without environment operators or Helm charts)
- Deploying MSDP Scaleout
- Managing MSDP Scaleout
- MSDP Scaleout maintenance
Config-Checker utility
The Config-Checker utility performs checks on the deployment environment to verify that the environment meets the requirements, before starting the primary server and media server deployments.
How does the Config-Checker works:
RetainReclaimPolicy check:
This check verifies that the storage classes used for PVC creation in the CR have reclaim policy as The check fails if any of the storage classes do not have the reclaim policy.
For more information, see the 'Persistent Volumes Reclaiming' section of the Kubernetes Documentation.
MinimumVolumeSize check:
This check verifies that the PVC storage capacity meets the minimum required volume size for each volume in the CR. The check fails if any of the volume capacity sizes does not meet the requirements.
Following are the minimum volume size requirements:
Primary server:
Data volume size: 30Gi
Catalog volume size: 100Gi
Log volume size: 30Gi
Media server:
Data volume size: 50Gi
Log volume size: 30Gi
Provisioner check:
EKS-specific only
Primary server: This will verify that the storage type provided is Amazon Elastic Block Store (Amazon EBS) for data and log volume. If any other driver type is used, the Config-Checker fails.
Media server: This will verify that the storage type provided is Amazon Elastic Block Store (Amazon EBS) for data and log volume. Config-Checker fails if this requirement is not met for media server.
AKS-specific only
This check verifies that the provisioner type used in defining the storage class is , for the volumes in Media servers. If not the Config-Checker will fail. This check verifies that the provisioner type used in defining the storage class is not for the volumes in Media servers. That is data and log volumes in case of Media server.
(EKS-specific only) AWS Load Balancer Controller add-on check:
This check verifies if the AWS Load Balancer Controller add-on is installed in the cluster. This load balancer controller is required for load balancer in the cluster. If this check fails, user must deploy the AWS Load Balancer Controller add-on
Cluster Autoscaler
This autoscaler is required for autoscaling in the cluster. If autoscaler is not configured, then Config-Checker displays a warning message and continues with the deployment of NetBackup servers.
(EKS-specific only) This check verifies if the AWS Autoscaler add-on is installed in the cluster. For more information, refer to 'Autoscaling' section of the Amazon EKS User Guide.
Volume expansion check:
This check verifies the storage class name given for Primary server data and log volume and for Media server data and log volumes has
AllowVolumeExpansion = true. If Config-Checker fails with this check then it gives a warning message and continues with deployment of NetBackup media servers.
Note the following points.
Config-Checker is executed as a separate job in Kubernetes cluster for both the primary server and media server CRs respectively. Each job creates a pod in the cluster. Config-checker creates the pod in the operator namespace.
Note:
Config-checker pod gets deleted after 4 hours.
Execution summary of the Config-Checker can be retrieved from the Config-Checker pod logs using the kubectl logs <configchecker-pod-name> -n <operator-namespace> command.
This summary can also be retrieved from the operator pod logs using the kubectl logs <operator-pod-name> -n <operator-namespace> command.
Following are the Config-Checker modes that can be specified in the Primary and Media CR:
Default: This mode executes the Config-Checker. If the execution is successful, the Primary and Media CRs deployment is started.
Note:
This is default mode of Config-Checker if not changed explicitly through CR specs.
Dryrun: This mode only executes the Config-Checker to verify the configuration requirements but does not start the CR deployment.
Skip: This mode skips the Config-Checker execution of Config-Checker and directly start the deployment of the respective CR.
Status of the Config-Checker can be retrieved from the primary server and media server CRs by using the kubectl describe <PrimaryServer/MediaServer> <CR name> -n <namespace> command.
For example, kubectl describe primaryservers environment-sample -n test
Following are the Config-Checker statuses:
Success: Indicates that all the mandatory config checks have successfully passed.
Failed: Indicates that some of the config checks have failed.
Running: Indicates that the Config-Checker execution is in progress.
Skip: Indicates that the Config-Checker is not executed because the
configcheckmodespecified in the CR is skipped.
If the Config-Checker execution status is Failed, you can check the Config-Checker job logs using kubectl logs <configchecker-pod-name> -n <operator-namespace>. Review the error codes and error messages pertaining to the failure and update the respective CR with the correct configuration details to resolve the errors.
For more information about the error codes, refer to NetBackup™ Status Codes Reference Guide.
If Config-Checker ran in mode and if user wants to run Config-Checker again with same values in Primary or Media server YAML as provided earlier, then user needs to delete respective CR of Primary or Media server. And then apply it again.
If it is primary server CR, delete primary server CR using the kubectl delete -f <environment.yaml> command.
Or
If it is media server CR, edit the Environment CR by removing the media server section in the
environment.yamlfile. Before removing the mediaServer section, you must save the content and note the location of the content. After removing section apply environment CR using kubectl apply -f <environment.yaml> command.Apply the CR again. Add the required data which was deleted earlier at correct location, save it and apply the yaml using kubectl apply -f <environment.yaml> command.