Search <book_title>...

Important Update: Cohesity Products Documentation

All Cohesity product documentation are now managed via the Cohesity Docs Portal: https://docs.cohesity.com/HomePage/Content/home.htm. Some documentation available here may not reflect the latest information or may no longer be accessible.

NetBackup™ Deployment Guide for Kubernetes Clusters

Last Published: 2025-02-26

Product(s): NetBackup (10.5.0.1)

Monitoring with Amazon CloudWatch

You can use Amazon CloudWatch to collect Prometheus metrics to monitor pods in MSDP-X cluster.

To configure Amazon CloudWatch

Install the CloudWatch agent with Prometheus metrics collection on EKS.
See AWS documentation.
Install the CloudWatch agent on EKS clusters. Select the EC2 launch type, and download the template YAML file Prometheus-eks.yaml.

Add the YAML file with the following sample configuration.

# create configmap for prometheus cwagent config
apiVersion: v1
data:
  # cwagent json config
  cwagentconfig.json: |
    {
      "logs": {
        "metrics_collected": {
          "prometheus": {
            "prometheus_config_path": "/etc/prometheusconfig/
             prometheus.yaml",
            "emf_processor": {
              "metric_declaration": [
                {
                  "source_labels": ["job"],
                  "label_matcher": "^msdpoperator-metrics",
                  "dimensions":[
                    ["ClusterName","NameSpace"]
                  ],
                  "metric_selectors": [
                    "msdpoperator_reconcile_failed",
                    "msdpoperator_operator_run",
                    "msdpoperator_diskFreeLess5GBEngines_total",
                    "msdpoperator_diskFreeMiBytesInEngine",
                    "msdpoperator_diskFreeLess10GBClusters_total",
                    "msdpoperator_totalDiskFreePercentInCluster",
                    "msdpoperator_diskFreePercentInEngine",
                    "msdpoperator_pvcFreePercentInCluster",
                    "msdpoperator_unhealthyEngines_total",
                    "msdpoperator_createdPods_total"
                  ]
                }
              ]
            }
          }
        },
        "force_flush_interval": 5
      }
    }
kind: ConfigMap
metadata:
  name: prometheus-cwagentconfig
  namespace: amazon-cloudwatch
 
---
# create configmap for prometheus scrape config
apiVersion: v1
data:
  # prometheus config
  prometheus.yaml: |
    global:
      scrape_interval: 1m
      scrape_timeout: 10s
    scrape_configs:
    - job_name: 'msdpoperator-metrics'
 
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        insecure_skip_verify: true
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount
      /token
 
      kubernetes_sd_configs:
      - role: pod
 
      relabel_configs:
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io
        _scrape]
        action: keep
        regex: true
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io
        _path]
        action: replace
        target_label: __metrics_path__
        regex: (.+)
      - source_labels: [__address__, __meta_kubernetes_pod_annotation_
        prometheus_io_port]
        action: replace
        regex: ([^:]+)(?::\d+)?;(\d+)
        replacement: $1:$2
        target_label: __address__
      - source_labels: [__meta_kubernetes_namespace]
        action: replace
        target_label: NameSpace
      - source_labels: [__meta_kubernetes_pod_name]
        action: replace
        target_label: PodName
 
kind: ConfigMap
metadata:
  name: prometheus-config
  namespace: amazon-cloudwatch

Table: Supported Prometheus metrics list in MSDP Scaleout lists the Prometheus metrics that MSDP Scaleout supports.

Apply the YAML file.
Kubectl apply -f Prometheus-eks.yaml
The default log groups name is /aws/containerinsights/{cluster_name}/Prometheus.
Create Amazon CloudWatch alarms.
See Using Amazon CloudWatch alarms in AWS documentation.
In the CloudWatch console, add the related log query. In the navigation pane, select Log Insights.
For example, the free space size of the MSDP scaleout cluster engines is lower than 1 GB in past 5 minutes. Select the log group from the drop-down list, and select the time duration 5m on the time line.
Log query:
```
fields @timestamp, @message
| filter msdpoperator_diskFreeMiBytesInEngine <= 100000
| sort @timestamp desc
```
If multiple MSDP scaleout clusters are deployed in the same EKS cluster, use the filter to search the results. For example, search the MSDP engines with the free space size lower than 1GB in the namespace sample-cr-namespace.
Log query:
```
fields @timestamp, @message
| filter msdpscalout_ns == "sample-cr-namespace"
| filter msdpoperator_diskFreeMiBytesInEngine <= 100000
| sort @timestamp desc
```

MSDP Scaleout supports the following Prometheus metrics:

Table: Supported Prometheus metrics list in MSDP Scaleout

Metrics	Type	Filters	Description
msdpoperator_reconcile_total	Counter	N/A	The total of the reconcile loops msdp-operator run.
msdpoperator_reconcile_failed	Counter	N/A	The total of the reconcile loops msdp-operator failed to run.
msdpoperator_operator_run	Counter	N/A	The total of the running operator.
msdpoperator_diskFree Less5GBEngines_total	Gauge	msdpscalout_ns	The checked number of the engines which have free spaces lower than 5GB.
msdpoperator_diskFree MiBytesInEngine	Gauge	msdpscalout_ns	The free space of current engine in MiBytes.
msdpoperator_diskFreeLess 10GBClusters_total	Gauge	msdpscalout_ns	The checked number of the msdpscaleout applications that have free spaces lower than 10GB.
msdpoperator_totalDiskFree PercentInCluster	Gauge	msdpscalout_ns	The percent of the msdpscaleout applications that have free spaces. For example, 0.95 means 95%
msdpoperator_diskFree PercentInEngine	Gauge	msdpscalout_ns	The percent of the current engines, which have free spaces.
msdpoperator_pvcFree PercentInCluster	Gauge	msdpscalout_ns, component	The percent of the used PVC, which have free spaces.
msdpoperator_unhealthy Engines_total	Gauge	msdpscalout_ns	The total of unhealthy engines.
msdopoerator_createdPods_total	Gauge	msdpscalout_ns, component	The total of created msdpscaleout pods.