NetBackup CloudScale pods may go into pending status when the max number of pods threshold has been exceeded

Article: 100055614
Last Published: 2023-09-19
Ratings: 0 0
Product(s): CloudPoint, NetBackup & Alta Data Protection

Problem

NetBackup CloudScale pods may go into pending status when the max number of pods threshold has been exceeded.

Error Message

Performing a describe on the pod in a pending state you see:

# kubectl describe pod <pod_in_pending_state> -n netbackup
Events:

  Type     Reason            Age                    From               Message

  ----     ------            ----                   ----               -------

  Warning  FailedScheduling  79s (x15182 over 13d)  default-scheduler  0/10 nodes are available: 1 Too many pods, 1 node(s) had taint {group: cpdata}, that the pod didn't tolerate, 2 node(s) didn't match Pod's node affinity/selector, 2 node(s) had taint {group: media}, that the pod didn't tolerate, 4 node(s) had taint {group: msdp}, that the pod didn't tolerate.

Cause

The number of pods per a node was exceeded.  This max number of pods can vary depending on the instance type.

Solution

Identify the max number of pods the node can handle:

Describe one of the nodes to determine the max-pods for the node.

# kubectl describe node <node-name>

The example output is as follows.

...
Allocatable:
  attachable-volumes-aws-ebs:  25
  cpu:                         1930m
  ephemeral-storage:           76224326324
  hugepages-1Gi:               0
  hugepages-2Mi:               0
  memory:                      7244720Ki
  pods:                        110

...

In the previous output, 110 is the maximum number of pods that Kubernetes will deploy to the node.

AWS also has a script that can be ran to identify the max number of pods per an instance:

https://docs.aws.amazon.com/eks/latest/userguide/choosing-instance-type.html#determine-max-pods

They are multiple ways to address this, but the bottom line is you need to increase the available pods.  This can be done by adding a node to the node pool, or changing the instance type to one that can handle more pods.

Was this content helpful?