NetBackup CloudScale pods may go into pending status when the max number of pods threshold has been exceeded

NetBackup CloudScale pods may go into pending status when the max number of pods threshold has been exceeded

Article: 100055614
Last Published: 2023-09-19
Ratings: 0 0
Product(s): CloudPoint, NetBackup

Problem

NetBackup CloudScale pods may go into pending status when the max number of pods threshold has been exceeded.

Error Message

Performing a describe on the pod in a pending state you see:

# kubectl describe pod <pod_in_pending_state> -n netbackup
Events:

  Type     Reason            Age                    From               Message

  ----     ------            ----                   ----               -------

  Warning  FailedScheduling  79s (x15182 over 13d)  default-scheduler  0/10 nodes are available: 1 Too many pods, 1 node(s) had taint {group: cpdata}, that the pod didn't tolerate, 2 node(s) didn't match Pod's node affinity/selector, 2 node(s) had taint {group: media}, that the pod didn't tolerate, 4 node(s) had taint {group: msdp}, that the pod didn't tolerate.

Cause

The number of pods per a node was exceeded.  This max number of pods can vary depending on the instance type.

Solution

Identify the max number of pods the node can handle:

Describe one of the nodes to determine the max-pods for the node.

# kubectl describe node <node-name>

The example output is as follows.

...
Allocatable:
  attachable-volumes-aws-ebs:  25
  cpu:                         1930m
  ephemeral-storage:           76224326324
  hugepages-1Gi:               0
  hugepages-2Mi:               0
  memory:                      7244720Ki
  pods:                        110

...

In the previous output, 110 is the maximum number of pods that Kubernetes will deploy to the node.

AWS also has a script that can be ran to identify the max number of pods per an instance:

https://docs.aws.amazon.com/eks/latest/userguide/choosing-instance-type.html#determine-max-pods

They are multiple ways to address this, but the bottom line is you need to increase the available pods.  This can be done by adding a node to the node pool, or changing the instance type to one that can handle more pods.

Was this content helpful?