Table of Contents
TLDR;
The prometheus configurations are below. Be sure to give the prometheus
service account cluster permissions to GET nodes/proxy and nodes api endpoints.
Go directly to the 3. Prometheus Configurations
Google cloud monitor only exposes a small subsection of cAdvisor metrics. With the setup below you’ll be able to collect all of the cAdvisor metrics from GKE. Here are the steps to directly query kubernetes to get cAdvisor metrics and the Prometheus configuration.
1. Create Service
To scrape the cAdvisor endpoint you’ll need to create a service account with cluster permissions to GET nodes/proxy and nodes.
Create a manifest called sa-manifests.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: test
namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: test
rules:
- apiGroups: [""]
resources:
- nodes
- nodes/proxy
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: test
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: test
subjects:
- kind: ServiceAccount
name: test
namespace: default
Run kubectl apply -f sa-manifests.yaml
2. Test API Manually
Create manifest file call pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: network
namespace: default
spec:
containers:
- name: network
image: praqma/network-multitool:c3d4e04
serviceAccountName: test
Run the following commands
kubectl apply -f pod.yaml
kubectl exec -it network bash -n default
Now that we are in the lets actually make a call api to kubernetes api get the cAdvisor Metrics. Run these individual commands.
# export the KSA bearer token to an env variable
export BEARER_TOKEN=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)
# Find the first K8s node
export NODE_NAME=$(curl https://kubernetes.default.svc.cluster.local:443/api/v1/nodes/ -s -H "Authorization: Bearer $BEARER_TOKEN" -k | jq -r .items[0].metadata.name)
# Make an api call to kubernetes using curl
curl https://kubernetes.default.svc.cluster.local:443/api/v1/nodes/$NODE_NAME/proxy/metrics/cadvisor -H "Authorization: Bearer $BEARER_TOKEN" -k
After that you should see metrics for the node
# HELP machine_nvm_capacity NVM capacity value labeled by NVM mode (memory mode or app direct mode).
# TYPE machine_nvm_capacity gauge
machine_nvm_capacity{boot_id="bf88bcb1-f7dc-425d-87cc-ec4994216eb9",machine_id="b1962a4fef066daf20ce3f9adc1ca5e5",mode="app_direct_mode",system_uuid="b1962a4f-ef06-6daf-20ce-3f9adc1ca5e5"} 0
machine_nvm_capacity{boot_id="bf88bcb1-f7dc-425d-87cc-ec4994216eb9",machine_id="b1962a4fef066daf20ce3f9adc1ca5e5",mode="memory_mode",system_uuid="b1962a4f-ef06-6daf-20ce-3f9adc1ca5e5"} 0
You can find a complete list of cAdvisor metrics on the official github repository.
3. Prometheus Configurations
Lets put these pieces together and create a Prometheus configuration that can read from the cAdvisors metrics.
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: kubernetes-cadvisor
honor_timestamps: true
scrape_interval: 15s
scrape_timeout: 10s
metrics_path: /metrics/cadvisor
scheme: https
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: kubernetes.default.svc.cluster.local:443
- source_labels: [ __meta_kubernetes_node_name ]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
metric_relabel_configs:
- source_labels: [ namespace ]
separator: ;
regex: ^$
replacement: $1
action: drop
- source_labels: [ pod ]
separator: ;
regex: ^$
replacement: $1
action: drop
Cheers!