Scrape metrics & logs using Grafana agent-operator

Ayan Bhadury
4 min readJan 28, 2023

In this article, I will talk about how you can configure metrics & logs using the grafana agent operator using helm chart & custom resource definition CRD.

Installing agent-operator using helm chart

Begin by adding and updating the grafana Helm chart repo:

helm repo add grafana https://grafana.github.io/helm-charts
helm repo update

Install the chart -

helm install <chart_alias_name> grafana/grafana-agent-operator

Configuring & applying custom resource definition (CRD)

It will consist of the following —
1- Grafana agent resource
2- MetricInstance resource
3- ServiceMonitor resource to collect cAdvisor and kubelet metrics.
4- LogsInstance resource
5- PodLogs resource to collect container logs from Kubernetes Pods.

Grafana agent resource — YAML file is defined below -

apiVersion: monitoring.grafana.com/v1alpha1
kind: GrafanaAgent
metadata:
name: grafana-agent
namespace: default
labels:
app: grafana-agent
spec:
image: grafana/agent:v0.30.2
logLevel: info
serviceAccountName: grafana-agent
metrics:
instanceSelector:
matchLabels:
agent: grafana-agent-metrics
externalLabels:
cluster: cloud

logs:
instanceSelector:
matchLabels:
agent: grafana-agent-logs

---

apiVersion: v1
kind: ServiceAccount
metadata:
name: grafana-agent
namespace: default

---

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: grafana-agent
rules:
- apiGroups:
- ""
resources:
- nodes
- nodes/proxy
- nodes/metrics
- services
- endpoints
- pods
- events
verbs:
- get
- list
- watch
- apiGroups:
- networking.k8s.io
resources:
- ingresses
verbs:
- get
- list
- watch
- nonResourceURLs:
- /metrics
- /metrics/cadvisor
verbs:
- get

---

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: grafana-agent
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: grafana-agent
subjects:
- kind: ServiceAccount
name: grafana-agent
namespace: default

MetricInstance resource — YAML file is defined below :

apiVersion: monitoring.grafana.com/v1alpha1
kind: MetricsInstance
metadata:
name: primary
namespace: default
labels:
agent: grafana-agent-metrics
spec:
remoteWrite:
- url: your_remote_write_URL
basicAuth:
username:
name: primary-credentials-metrics
key: username
password:
name: primary-credentials-metrics
key: password

# As an alternative authentication method, Grafana Agent also supports OAuth2.
# - url: your_remote_write_URL
# oauth2:
# clientId:
# secret:
# key: username # Kubernetes Secret Key
# name: primary-credentials-metrics # Kubernetes Secret Name
# clientSecret:
# key: password # Kubernetes Secret Key
# name: primary-credentials-metrics # Kubernetes Secret Name
# tokenUrl: https://auth.example.com/realms/master/protocol/openid-connect/token


# Supply an empty namespace selector to look in all namespaces. Remove
# this to only look in the same namespace as the MetricsInstance CR
serviceMonitorNamespaceSelector: {}
serviceMonitorSelector:
matchLabels:
instance: primary

# Supply an empty namespace selector to look in all namespaces. Remove
# this to only look in the same namespace as the MetricsInstance CR.
podMonitorNamespaceSelector: {}
podMonitorSelector:
matchLabels:
instance: primary

# Supply an empty namespace selector to look in all namespaces. Remove
# this to only look in the same namespace as the MetricsInstance CR.
probeNamespaceSelector: {}
probeSelector:
matchLabels:
instance: primary

Replace your_remote_write_URL with your endpoint URL for metrics.

Here is the corresponding secret yaml config for the remoteWrite config :

apiVersion: v1
kind: Secret
metadata:
name: primary-credentials-metrics
namespace: default
stringData:
username: 'your_cloud_prometheus_username'
password: 'your_cloud_prometheus_API_key'

ServiceMonitor resource to collect cAdvisor and kubelet metrics- YAML file is defined below :

kubelte servicemonitor —

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
instance: primary
name: kubelet-monitor
namespace: default
spec:
endpoints:
- bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
honorLabels: true
interval: 60s
metricRelabelings:
- action: keep
regex: kubelet_cgroup_manager_duration_seconds_count|go_goroutines|kubelet_pod_start_duration_seconds_count|kubelet_runtime_operations_total|kubelet_pleg_relist_duration_seconds_bucket|volume_manager_total_volumes|kubelet_volume_stats_capacity_bytes|container_cpu_usage_seconds_total|container_network_transmit_bytes_total|kubelet_runtime_operations_errors_total|container_network_receive_bytes_total|container_memory_swap|container_network_receive_packets_total|container_cpu_cfs_periods_total|container_cpu_cfs_throttled_periods_total|kubelet_running_pod_count|node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate|container_memory_working_set_bytes|storage_operation_errors_total|kubelet_pleg_relist_duration_seconds_count|kubelet_running_pods|rest_client_request_duration_seconds_bucket|process_resident_memory_bytes|storage_operation_duration_seconds_count|kubelet_running_containers|kubelet_runtime_operations_duration_seconds_bucket|kubelet_node_config_error|kubelet_cgroup_manager_duration_seconds_bucket|kubelet_running_container_count|kubelet_volume_stats_available_bytes|kubelet_volume_stats_inodes|container_memory_rss|kubelet_pod_worker_duration_seconds_count|kubelet_node_name|kubelet_pleg_relist_interval_seconds_bucket|container_network_receive_packets_dropped_total|kubelet_pod_worker_duration_seconds_bucket|container_start_time_seconds|container_network_transmit_packets_dropped_total|process_cpu_seconds_total|storage_operation_duration_seconds_bucket|container_memory_cache|container_network_transmit_packets_total|kubelet_volume_stats_inodes_used|up|rest_client_requests_total
sourceLabels:
- __name__
port: https-metrics
relabelings:
- sourceLabels:
- __metrics_path__
targetLabel: metrics_path
- action: replace
targetLabel: job
replacement: integrations/kubernetes/kubelet
scheme: https
tlsConfig:
insecureSkipVerify: true
namespaceSelector:
matchNames:
- default
selector:
matchLabels:
app.kubernetes.io/name: kubelet

cAdvsior ServiceMonitor -

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
instance: primary
name: cadvisor-monitor
namespace: default
spec:
endpoints:
- bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
honorLabels: true
honorTimestamps: false
interval: 60s
metricRelabelings:
- action: keep
regex: kubelet_cgroup_manager_duration_seconds_count|go_goroutines|kubelet_pod_start_duration_seconds_count|kubelet_runtime_operations_total|kubelet_pleg_relist_duration_seconds_bucket|volume_manager_total_volumes|kubelet_volume_stats_capacity_bytes|container_cpu_usage_seconds_total|container_network_transmit_bytes_total|kubelet_runtime_operations_errors_total|container_network_receive_bytes_total|container_memory_swap|container_network_receive_packets_total|container_cpu_cfs_periods_total|container_cpu_cfs_throttled_periods_total|kubelet_running_pod_count|node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate|container_memory_working_set_bytes|storage_operation_errors_total|kubelet_pleg_relist_duration_seconds_count|kubelet_running_pods|rest_client_request_duration_seconds_bucket|process_resident_memory_bytes|storage_operation_duration_seconds_count|kubelet_running_containers|kubelet_runtime_operations_duration_seconds_bucket|kubelet_node_config_error|kubelet_cgroup_manager_duration_seconds_bucket|kubelet_running_container_count|kubelet_volume_stats_available_bytes|kubelet_volume_stats_inodes|container_memory_rss|kubelet_pod_worker_duration_seconds_count|kubelet_node_name|kubelet_pleg_relist_interval_seconds_bucket|container_network_receive_packets_dropped_total|kubelet_pod_worker_duration_seconds_bucket|container_start_time_seconds|container_network_transmit_packets_dropped_total|process_cpu_seconds_total|storage_operation_duration_seconds_bucket|container_memory_cache|container_network_transmit_packets_total|kubelet_volume_stats_inodes_used|up|rest_client_requests_total
sourceLabels:
- __name__
path: /metrics/cadvisor
port: https-metrics
relabelings:
- sourceLabels:
- __metrics_path__
targetLabel: metrics_path
- action: replace
targetLabel: job
replacement: integrations/kubernetes/cadvisor
scheme: https
tlsConfig:
insecureSkipVerify: true
namespaceSelector:
matchNames:
- default
selector:
matchLabels:
app.kubernetes.io/name: kubelet

LogsInstance resource — YAML file is defined below :

apiVersion: monitoring.grafana.com/v1alpha1
kind: LogsInstance
metadata:
name: primary
namespace: default
labels:
agent: grafana-agent-logs
spec:
clients:
- url: your_remote_logs_URL
basicAuth:
username:
name: primary-credentials-logs
key: username
password:
name: primary-credentials-logs
key: password

# Supply an empty namespace selector to look in all namespaces. Remove
# this to only look in the same namespace as the LogsInstance CR
podLogsNamespaceSelector: {}
podLogsSelector:
matchLabels:
instance: primary

Replace your_remote_logs_URL with your endpoint URL for logs.

Here is the corresponding secret yaml config for the remote logs config :

apiVersion: v1
kind: Secret
metadata:
name: primary-credentials-logs
namespace: default
stringData:
username: 'your_username_here'
password: 'your_password_here'

PodLogs resource — YAML file is defined below :

apiVersion: monitoring.grafana.com/v1alpha1
kind: PodLogs
metadata:
labels:
instance: primary
name: kubernetes-pods
namespace: default
spec:
namespaceSelector:
any: true
pipelineStages:
- cri: {}
relabelings:
- sourceLabels:
- __meta_kubernetes_pod_node_name
targetLabel: __host__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- action: replace
sourceLabels:
- __meta_kubernetes_namespace
targetLabel: namespace
- action: replace
sourceLabels:
- __meta_kubernetes_pod_name
targetLabel: pod
- action: replace
sourceLabels:
- __meta_kubernetes_container_name
targetLabel: container
- replacement: /var/log/pods/*$1/*.log
separator: /
sourceLabels:
- __meta_kubernetes_pod_uid
- __meta_kubernetes_pod_container_name
targetLabel: __path__
selector:
matchLabels: {}

IMPORTANT NOTES :

1 — If you are deploying in a separate name space make sure to change the default value to your namespace value.
2- Once you apply each resource you need to use the following command to apply the changes to your cluster —

kubectl apply -f <yaml_file_name> -n <namespace_name>

Lets see the results

Once done you can see the metrics shown on your explore UI —

Here is a similar view for logs on your explore UI —

This article was inspired & referred from here — https://grafana.com/docs/agent/latest/operator/helm-getting-started/

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Ayan Bhadury
Ayan Bhadury

Written by Ayan Bhadury

Tech reviewer.. JS developer by heart

No responses yet

Write a response