NetBackup CloudScale pods may go into pending status when the max number of pods threshold has been exceeded
Problem
NetBackup CloudScale pods may go into pending status when the max number of pods threshold has been exceeded.
Error Message
Performing a describe on the pod in a pending state you see:
# kubectl describe pod <pod_in_pending_state> -n netbackup
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 79s (x15182 over 13d) default-scheduler 0/10 nodes are available: 1 Too many pods, 1 node(s) had taint {group: cpdata}, that the pod didn't tolerate, 2 node(s) didn't match Pod's node affinity/selector, 2 node(s) had taint {group: media}, that the pod didn't tolerate, 4 node(s) had taint {group: msdp}, that the pod didn't tolerate.
Cause
The number of pods per a node was exceeded. This max number of pods can vary depending on the instance type.
Solution
Identify the max number of pods the node can handle:
Describe one of the nodes to determine the max-pods for the node.
# kubectl describe node <node-name>
The example output is as follows.
...Allocatable:
attachable-volumes-aws-ebs: 25
cpu: 1930m
ephemeral-storage: 76224326324
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 7244720Ki
pods: 110
...
In the previous output, 110 is the maximum number of pods that Kubernetes will deploy to the node.
AWS also has a script that can be ran to identify the max number of pods per an instance:
https://docs.aws.amazon.com/eks/latest/userguide/choosing-instance-type.html#determine-max-pods
They are multiple ways to address this, but the bottom line is you need to increase the available pods. This can be done by adding a node to the node pool, or changing the instance type to one that can handle more pods.