SDS Operator Pod in CrashLoopBackOff state following OpenShift node removal

Article: 100074564
Last Published: 2025-07-01
Ratings: 0 0
Product(s): InfoScale & Storage Foundation

Problem 

SDS Operator Pod in CrashLoopBackOff state following OpenShift node removal

 

Error Message

CrashLoopBackOff

Cause

In a disk-based fencing configuration using the VIKE solution, when a worker node is completely removed from the OpenShift cluster, the sds-operator is unable to transition to a running state even though it successfully initiates the removal process for the node.

Example:

# oc get infoscalecluster

Name                 VERSION   CLUSTERID  STATE                        DISKGROUPS             STATUS    AGE         
isc-primary          8.0.400   1000          ProcessingRemoveNode vrts_kube_dg-1000      Degraded  262d

 

Solution

Please contact Arctera Support to obtain the updated SDS Operator images compatible with version 8.0.400

Steps to replace the image:

  • Load the image into the private registry.

    Login to private registry 
    podman load -i <sds-operator-image> 
    podman tag
    <image_id> <registry_path>/infoscale-sds-operator:8.0.400-rhel
    podman push <registry_path>/infoscale-sds-operator:8.0.400-rhel

  • Login into the node where the sds-operator pod is running with the core user.

  • Elevate to root user: sudo su - root 

  • Pull the updated images from the registry podman pull <registry image path>/infoscale-sds-operator:8.0.400-rhel

  • On bastion host, edit the SDS Operator deployment
    oc edit deployment infoscale-sds-operator
  • Change both occurrences from image: to <registry image path>/infoscale-sds-operator:8.0.400-rhel
     
  • At the top of spec: add:
    nodeName: <hostname of the worker node where sds operator in crashloopbackoff>

References

JIRA : STESC-9614

Was this content helpful?