Scrape metrics & logs using Grafana agent-operator

Ayan Bhadury
4 min readJan 28, 2023

In this article, I will talk about how you can configure metrics & logs using the grafana agent operator using helm chart & custom resource definition CRD.

Installing agent-operator using helm chart

Begin by adding and updating the grafana Helm chart repo:

helm repo add grafana https://grafana.github.io/helm-charts
helm repo update

Install the chart -

helm install <chart_alias_name> grafana/grafana-agent-operator

Configuring & applying custom resource definition (CRD)

It will consist of the following —
1- Grafana agent resource
2- MetricInstance resource
3- ServiceMonitor resource to collect cAdvisor and kubelet metrics.
4- LogsInstance resource
5- PodLogs resource to collect container logs from Kubernetes Pods.

Grafana agent resource — YAML file is defined below -

apiVersion: monitoring.grafana.com/v1alpha1
kind: GrafanaAgent
metadata:
name: grafana-agent
namespace: default
labels:
app: grafana-agent
spec:
image: grafana/agent:v0.30.2
logLevel: info
serviceAccountName: grafana-agent
metrics:
instanceSelector:
matchLabels:
agent: grafana-agent-metrics
externalLabels:
cluster: cloud

logs:
instanceSelector:
matchLabels:
agent: grafana-agent-logs

---

apiVersion: v1
kind: ServiceAccount
metadata:
name: grafana-agent
namespace: default

---

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: grafana-agent
rules:
- apiGroups:
- ""
resources:
- nodes
- nodes/proxy
- nodes/metrics
- services
- endpoints
- pods
- events
verbs:
- get
- list
- watch
- apiGroups:
- networking.k8s.io
resources:
- ingresses
verbs:
- get
- list
- watch
- nonResourceURLs:
- /metrics
- /metrics/cadvisor
verbs:
- get

---

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: grafana-agent
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: grafana-agent
subjects:
- kind: ServiceAccount
name: grafana-agent
namespace: default

MetricInstance resource — YAML file is defined below :

apiVersion: monitoring.grafana.com/v1alpha1
kind: MetricsInstance
metadata:
name: primary
namespace: default
labels:
agent: grafana-agent-metrics
spec:
remoteWrite:
- url: your_remote_write_URL
basicAuth:
username:
name: primary-credentials-metrics
key: username
password:
name: primary-credentials-metrics
key: password

# As an alternative authentication method, Grafana Agent also supports OAuth2.
# - url: your_remote_write_URL
# oauth2:
# clientId:
# secret:
# key: username # Kubernetes Secret Key
# name: primary-credentials-metrics # Kubernetes Secret Name
# clientSecret:
# key: password # Kubernetes Secret Key
# name: primary-credentials-metrics # Kubernetes Secret Name
# tokenUrl: https://auth.example.com/realms/master/protocol/openid-connect/token


# Supply an empty namespace selector to look in all namespaces. Remove
# this to only look in the same namespace as the MetricsInstance CR
serviceMonitorNamespaceSelector: {}
serviceMonitorSelector:
matchLabels:
instance: primary

# Supply an empty namespace selector to look in all namespaces. Remove
# this to only look in the same namespace as the MetricsInstance CR.
podMonitorNamespaceSelector: {}
podMonitorSelector:
matchLabels:
instance: primary

# Supply an empty namespace selector to look in all namespaces. Remove
# this to only look in the same namespace as the MetricsInstance CR.
probeNamespaceSelector: {}
probeSelector:
matchLabels:
instance: primary

Replace your_remote_write_URL with your endpoint URL for metrics.

Here is the corresponding secret yaml config for the remoteWrite config :

apiVersion: v1
kind: Secret
metadata:
name: primary-credentials-metrics
namespace: default
stringData:
username: 'your_cloud_prometheus_username'
password: 'your_cloud_prometheus_API_key'

ServiceMonitor resource to collect cAdvisor and kubelet metrics- YAML file is defined below :

kubelte servicemonitor —

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
instance: primary
name: kubelet-monitor
namespace: default
spec:
endpoints:
- bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
honorLabels: true
interval: 60s
metricRelabelings:
- action: keep
regex: kubelet_cgroup_manager_duration_seconds_count|go_goroutines|kubelet_pod_start_duration_seconds_count|kubelet_runtime_operations_total|kubelet_pleg_relist_duration_seconds_bucket|volume_manager_total_volumes|kubelet_volume_stats_capacity_bytes|container_cpu_usage_seconds_total|container_network_transmit_bytes_total|kubelet_runtime_operations_errors_total|container_network_receive_bytes_total|container_memory_swap|container_network_receive_packets_total|container_cpu_cfs_periods_total|container_cpu_cfs_throttled_periods_total|kubelet_running_pod_count|node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate|container_memory_working_set_bytes|storage_operation_errors_total|kubelet_pleg_relist_duration_seconds_count|kubelet_running_pods|rest_client_request_duration_seconds_bucket|process_resident_memory_bytes|storage_operation_duration_seconds_count|kubelet_running_containers|kubelet_runtime_operations_duration_seconds_bucket|kubelet_node_config_error|kubelet_cgroup_manager_duration_seconds_bucket|kubelet_running_container_count|kubelet_volume_stats_available_bytes|kubelet_volume_stats_inodes|container_memory_rss|kubelet_pod_worker_duration_seconds_count|kubelet_node_name|kubelet_pleg_relist_interval_seconds_bucket|container_network_receive_packets_dropped_total|kubelet_pod_worker_duration_seconds_bucket|container_start_time_seconds|container_network_transmit_packets_dropped_total|process_cpu_seconds_total|storage_operation_duration_seconds_bucket|container_memory_cache|container_network_transmit_packets_total|kubelet_volume_stats_inodes_used|up|rest_client_requests_total
sourceLabels:
- __name__
port: https-metrics
relabelings:
- sourceLabels:
- __metrics_path__
targetLabel: metrics_path
- action: replace
targetLabel: job
replacement: integrations/kubernetes/kubelet
scheme: https
tlsConfig:
insecureSkipVerify: true
namespaceSelector:
matchNames:
- default
selector:
matchLabels:
app.kubernetes.io/name: kubelet

cAdvsior ServiceMonitor -

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
instance: primary
name: cadvisor-monitor
namespace: default
spec:
endpoints:
- bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
honorLabels: true
honorTimestamps: false
interval: 60s
metricRelabelings:
- action: keep
regex: kubelet_cgroup_manager_duration_seconds_count|go_goroutines|kubelet_pod_start_duration_seconds_count|kubelet_runtime_operations_total|kubelet_pleg_relist_duration_seconds_bucket|volume_manager_total_volumes|kubelet_volume_stats_capacity_bytes|container_cpu_usage_seconds_total|container_network_transmit_bytes_total|kubelet_runtime_operations_errors_total|container_network_receive_bytes_total|container_memory_swap|container_network_receive_packets_total|container_cpu_cfs_periods_total|container_cpu_cfs_throttled_periods_total|kubelet_running_pod_count|node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate|container_memory_working_set_bytes|storage_operation_errors_total|kubelet_pleg_relist_duration_seconds_count|kubelet_running_pods|rest_client_request_duration_seconds_bucket|process_resident_memory_bytes|storage_operation_duration_seconds_count|kubelet_running_containers|kubelet_runtime_operations_duration_seconds_bucket|kubelet_node_config_error|kubelet_cgroup_manager_duration_seconds_bucket|kubelet_running_container_count|kubelet_volume_stats_available_bytes|kubelet_volume_stats_inodes|container_memory_rss|kubelet_pod_worker_duration_seconds_count|kubelet_node_name|kubelet_pleg_relist_interval_seconds_bucket|container_network_receive_packets_dropped_total|kubelet_pod_worker_duration_seconds_bucket|container_start_time_seconds|container_network_transmit_packets_dropped_total|process_cpu_seconds_total|storage_operation_duration_seconds_bucket|container_memory_cache|container_network_transmit_packets_total|kubelet_volume_stats_inodes_used|up|rest_client_requests_total
sourceLabels:
- __name__
path: /metrics/cadvisor
port: https-metrics
relabelings:
- sourceLabels:
- __metrics_path__
targetLabel: metrics_path
- action: replace
targetLabel: job
replacement: integrations/kubernetes/cadvisor
scheme: https
tlsConfig:
insecureSkipVerify: true
namespaceSelector:
matchNames:
- default
selector:
matchLabels:
app.kubernetes.io/name: kubelet

LogsInstance resource — YAML file is defined below :

apiVersion: monitoring.grafana.com/v1alpha1
kind: LogsInstance
metadata:
name: primary
namespace: default
labels:
agent: grafana-agent-logs
spec:
clients:
- url: your_remote_logs_URL
basicAuth:
username:
name: primary-credentials-logs
key: username
password:
name: primary-credentials-logs
key: password

# Supply an empty namespace selector to look in all namespaces. Remove
# this to only look in the same namespace as the LogsInstance CR
podLogsNamespaceSelector: {}
podLogsSelector:
matchLabels:
instance: primary

Replace your_remote_logs_URL with your endpoint URL for logs.

Here is the corresponding secret yaml config for the remote logs config :

apiVersion: v1
kind: Secret
metadata:
name: primary-credentials-logs
namespace: default
stringData:
username: 'your_username_here'
password: 'your_password_here'

PodLogs resource — YAML file is defined below :

apiVersion: monitoring.grafana.com/v1alpha1
kind: PodLogs
metadata:
labels:
instance: primary
name: kubernetes-pods
namespace: default
spec:
namespaceSelector:
any: true
pipelineStages:
- cri: {}
relabelings:
- sourceLabels:
- __meta_kubernetes_pod_node_name
targetLabel: __host__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- action: replace
sourceLabels:
- __meta_kubernetes_namespace
targetLabel: namespace
- action: replace
sourceLabels:
- __meta_kubernetes_pod_name
targetLabel: pod
- action: replace
sourceLabels:
- __meta_kubernetes_container_name
targetLabel: container
- replacement: /var/log/pods/*$1/*.log
separator: /
sourceLabels:
- __meta_kubernetes_pod_uid
- __meta_kubernetes_pod_container_name
targetLabel: __path__
selector:
matchLabels: {}

IMPORTANT NOTES :

1 — If you are deploying in a separate name space make sure to change the default value to your namespace value.
2- Once you apply each resource you need to use the following command to apply the changes to your cluster —

kubectl apply -f <yaml_file_name> -n <namespace_name>

Lets see the results

Once done you can see the metrics shown on your explore UI —

Here is a similar view for logs on your explore UI —

This article was inspired & referred from here — https://grafana.com/docs/agent/latest/operator/helm-getting-started/

--

--