Scrape metrics & logs using Grafana agent-operator

4 min readJan 28, 2023

In this article, I will talk about how you can configure metrics & logs using the grafana agent operator using helm chart & custom resource definition CRD.

Installing agent-operator using helm chart

Begin by adding and updating the grafana Helm chart repo:

helm repo add grafana https://grafana.github.io/helm-charts
helm repo update

Install the chart -

helm install <chart_alias_name> grafana/grafana-agent-operator

Configuring & applying custom resource definition (CRD)

It will consist of the following —
1- Grafana agent resource
2- MetricInstance resource
3- ServiceMonitor resource to collect cAdvisor and kubelet metrics.
4- LogsInstance resource
5- PodLogs resource to collect container logs from Kubernetes Pods.

Grafana agent resource — YAML file is defined below -

apiVersion: monitoring.grafana.com/v1alpha1
kind: GrafanaAgent
metadata:
  name: grafana-agent
  namespace: default
  labels:
    app: grafana-agent
spec:
  image: grafana/agent:v0.30.2
  logLevel: info
  serviceAccountName: grafana-agent
  metrics:
    instanceSelector:
      matchLabels:
        agent: grafana-agent-metrics
    externalLabels:
      cluster: cloud

  logs:
    instanceSelector:
      matchLabels:
        agent: grafana-agent-logs

---

apiVersion: v1
kind: ServiceAccount
metadata:
  name: grafana-agent
  namespace: default

---

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: grafana-agent
rules:
- apiGroups:
  - ""
  resources:
  - nodes
  - nodes/proxy
  - nodes/metrics
  - services
  - endpoints
  - pods
  - events
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - networking.k8s.io
  resources:
  - ingresses
  verbs:
  - get
  - list
  - watch
- nonResourceURLs:
  - /metrics
  - /metrics/cadvisor
  verbs:
  - get

---

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: grafana-agent
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: grafana-agent
subjects:
- kind: ServiceAccount
  name: grafana-agent
  namespace: default

MetricInstance resource — YAML file is defined below :

apiVersion: monitoring.grafana.com/v1alpha1
kind: MetricsInstance
metadata:
  name: primary
  namespace: default
  labels:
    agent: grafana-agent-metrics
spec:
  remoteWrite:
  - url: your_remote_write_URL
    basicAuth:
      username:
        name: primary-credentials-metrics
        key: username
      password:
        name: primary-credentials-metrics
        key: password

  # As an alternative authentication method, Grafana Agent also supports OAuth2.
  # - url: your_remote_write_URL
  #   oauth2:
  #     clientId:
  #       secret:
  #         key: username # Kubernetes Secret Key
  #         name: primary-credentials-metrics # Kubernetes Secret Name
  #     clientSecret:
  #       key: password # Kubernetes Secret Key
  #       name: primary-credentials-metrics # Kubernetes Secret Name
  #     tokenUrl: https://auth.example.com/realms/master/protocol/openid-connect/token


  # Supply an empty namespace selector to look in all namespaces. Remove
  # this to only look in the same namespace as the MetricsInstance CR
  serviceMonitorNamespaceSelector: {}
  serviceMonitorSelector:
    matchLabels:
      instance: primary

  # Supply an empty namespace selector to look in all namespaces. Remove
  # this to only look in the same namespace as the MetricsInstance CR.
  podMonitorNamespaceSelector: {}
  podMonitorSelector:
    matchLabels:
      instance: primary

  # Supply an empty namespace selector to look in all namespaces. Remove
  # this to only look in the same namespace as the MetricsInstance CR.
  probeNamespaceSelector: {}
  probeSelector:
    matchLabels:
      instance: primary

Replace your_remote_write_URL with your endpoint URL for metrics.

Here is the corresponding secret yaml config for the remoteWrite config :

apiVersion: v1
kind: Secret
metadata:
  name: primary-credentials-metrics
  namespace: default
stringData:
  username: 'your_cloud_prometheus_username'
  password: 'your_cloud_prometheus_API_key'

ServiceMonitor resource to collect cAdvisor and kubelet metrics- YAML file is defined below :

kubelte servicemonitor —

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    instance: primary
  name: kubelet-monitor
  namespace: default
spec:
  endpoints:
  - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
    honorLabels: true
    interval: 60s
    metricRelabelings:
    - action: keep
      regex: kubelet_cgroup_manager_duration_seconds_count|go_goroutines|kubelet_pod_start_duration_seconds_count|kubelet_runtime_operations_total|kubelet_pleg_relist_duration_seconds_bucket|volume_manager_total_volumes|kubelet_volume_stats_capacity_bytes|container_cpu_usage_seconds_total|container_network_transmit_bytes_total|kubelet_runtime_operations_errors_total|container_network_receive_bytes_total|container_memory_swap|container_network_receive_packets_total|container_cpu_cfs_periods_total|container_cpu_cfs_throttled_periods_total|kubelet_running_pod_count|node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate|container_memory_working_set_bytes|storage_operation_errors_total|kubelet_pleg_relist_duration_seconds_count|kubelet_running_pods|rest_client_request_duration_seconds_bucket|process_resident_memory_bytes|storage_operation_duration_seconds_count|kubelet_running_containers|kubelet_runtime_operations_duration_seconds_bucket|kubelet_node_config_error|kubelet_cgroup_manager_duration_seconds_bucket|kubelet_running_container_count|kubelet_volume_stats_available_bytes|kubelet_volume_stats_inodes|container_memory_rss|kubelet_pod_worker_duration_seconds_count|kubelet_node_name|kubelet_pleg_relist_interval_seconds_bucket|container_network_receive_packets_dropped_total|kubelet_pod_worker_duration_seconds_bucket|container_start_time_seconds|container_network_transmit_packets_dropped_total|process_cpu_seconds_total|storage_operation_duration_seconds_bucket|container_memory_cache|container_network_transmit_packets_total|kubelet_volume_stats_inodes_used|up|rest_client_requests_total
      sourceLabels:
      - __name__
    port: https-metrics
    relabelings:
    - sourceLabels:
      - __metrics_path__
      targetLabel: metrics_path
    - action: replace
      targetLabel: job
      replacement: integrations/kubernetes/kubelet
    scheme: https
    tlsConfig:
      insecureSkipVerify: true
  namespaceSelector:
    matchNames:
    - default
  selector:
    matchLabels:
      app.kubernetes.io/name: kubelet

cAdvsior ServiceMonitor -

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    instance: primary
  name: cadvisor-monitor
  namespace: default
spec:
  endpoints:
  - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
    honorLabels: true
    honorTimestamps: false
    interval: 60s
    metricRelabelings:
    - action: keep
      regex: kubelet_cgroup_manager_duration_seconds_count|go_goroutines|kubelet_pod_start_duration_seconds_count|kubelet_runtime_operations_total|kubelet_pleg_relist_duration_seconds_bucket|volume_manager_total_volumes|kubelet_volume_stats_capacity_bytes|container_cpu_usage_seconds_total|container_network_transmit_bytes_total|kubelet_runtime_operations_errors_total|container_network_receive_bytes_total|container_memory_swap|container_network_receive_packets_total|container_cpu_cfs_periods_total|container_cpu_cfs_throttled_periods_total|kubelet_running_pod_count|node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate|container_memory_working_set_bytes|storage_operation_errors_total|kubelet_pleg_relist_duration_seconds_count|kubelet_running_pods|rest_client_request_duration_seconds_bucket|process_resident_memory_bytes|storage_operation_duration_seconds_count|kubelet_running_containers|kubelet_runtime_operations_duration_seconds_bucket|kubelet_node_config_error|kubelet_cgroup_manager_duration_seconds_bucket|kubelet_running_container_count|kubelet_volume_stats_available_bytes|kubelet_volume_stats_inodes|container_memory_rss|kubelet_pod_worker_duration_seconds_count|kubelet_node_name|kubelet_pleg_relist_interval_seconds_bucket|container_network_receive_packets_dropped_total|kubelet_pod_worker_duration_seconds_bucket|container_start_time_seconds|container_network_transmit_packets_dropped_total|process_cpu_seconds_total|storage_operation_duration_seconds_bucket|container_memory_cache|container_network_transmit_packets_total|kubelet_volume_stats_inodes_used|up|rest_client_requests_total
      sourceLabels:
      - __name__
    path: /metrics/cadvisor
    port: https-metrics
    relabelings:
    - sourceLabels:
      - __metrics_path__
      targetLabel: metrics_path
    - action: replace
      targetLabel: job
      replacement: integrations/kubernetes/cadvisor
    scheme: https
    tlsConfig:
      insecureSkipVerify: true
  namespaceSelector:
    matchNames:
    - default
  selector:
    matchLabels:
      app.kubernetes.io/name: kubelet

LogsInstance resource — YAML file is defined below :

apiVersion: monitoring.grafana.com/v1alpha1
kind: LogsInstance
metadata:
  name: primary
  namespace: default
  labels:
    agent: grafana-agent-logs
spec:
  clients:
  - url: your_remote_logs_URL
    basicAuth:
      username:
        name: primary-credentials-logs
        key: username
      password:
        name: primary-credentials-logs
        key: password

  # Supply an empty namespace selector to look in all namespaces. Remove
  # this to only look in the same namespace as the LogsInstance CR
  podLogsNamespaceSelector: {}
  podLogsSelector:
    matchLabels:
      instance: primary

Replace your_remote_logs_URL with your endpoint URL for logs.

Here is the corresponding secret yaml config for the remote logs config :

apiVersion: v1
kind: Secret
metadata:
  name: primary-credentials-logs
  namespace: default
stringData:
  username: 'your_username_here'
  password: 'your_password_here'

PodLogs resource — YAML file is defined below :

apiVersion: monitoring.grafana.com/v1alpha1
kind: PodLogs
metadata:
  labels:
    instance: primary
  name: kubernetes-pods
  namespace: default
spec:
  namespaceSelector:
    any: true
  pipelineStages:
  - cri: {}
  relabelings:
  - sourceLabels:
    - __meta_kubernetes_pod_node_name
    targetLabel: __host__
  - action: labelmap
    regex: __meta_kubernetes_pod_label_(.+)
  - action: replace
    sourceLabels:
    - __meta_kubernetes_namespace
    targetLabel: namespace
  - action: replace
    sourceLabels:
    - __meta_kubernetes_pod_name
    targetLabel: pod
  - action: replace
    sourceLabels:
    - __meta_kubernetes_container_name
    targetLabel: container
  - replacement: /var/log/pods/*$1/*.log
    separator: /
    sourceLabels:
    - __meta_kubernetes_pod_uid
    - __meta_kubernetes_pod_container_name
    targetLabel: __path__
  selector:
    matchLabels: {}

IMPORTANT NOTES :

1 — If you are deploying in a separate name space make sure to change the default value to your namespace value.
2- Once you apply each resource you need to use the following command to apply the changes to your cluster —

kubectl apply -f <yaml_file_name> -n <namespace_name>

Lets see the results

Once done you can see the metrics shown on your explore UI —

Here is a similar view for logs on your explore UI —

This article was inspired & referred from here — https://grafana.com/docs/agent/latest/operator/helm-getting-started/

Scrape metrics & logs using Grafana agent-operator

Installing agent-operator using helm chart

Configuring & applying custom resource definition (CRD)

IMPORTANT NOTES :

Lets see the results

Written by Ayan Bhadury