Table of Contents
Google Just Why?
GCP Horizontal Pod Autoscaling with Pub/Sub shouldn’t be as complicated as it is. I’m not sure why but following this GCP article it appears workload identity doesn’t work with the stack driver.
I instead did it the “old” way of using Google Service Accounts instead.
Assumptions
- You already have a k8s cluster running.
- You have kubectl installed and you are authenticated into your cluster
- You have admin permissions with GKE to do the following
- Create pub/sub topics & subscriptions
- Create service accounts
- Admin permissions inside of your k8s cluster
- You already have workload identity turned on for BOTH you cluster and node pool


If all the assumptions are true then your ready to run the script below. If not follow this guide GCP guide up until the “Deploying the Custom Metrics Adapter.”
Lets Get Down to HPA
First create a manifest file for a application and call the file test-app.yaml
This manifest will be called by the script below so make sure its in the working directory when you execute the script
apiVersion: v1
kind: ServiceAccount
metadata:
name: pubsub-sa
---
# [START gke_deployment_pubsub_with_workflow_identity_deployment_pubsub]
# [START container_pubsub_workload_identity_deployment]
apiVersion: apps/v1
kind: Deployment
metadata:
name: pubsub
spec:
selector:
matchLabels:
app: pubsub
template:
metadata:
labels:
app: pubsub
spec:
serviceAccountName: pubsub-sa
containers:
- name: subscriber
image: us-docker.pkg.dev/google-samples/containers/gke/pubsub-sample:v2
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: pubsub
spec:
minReplicas: 1
maxReplicas: 4
metrics:
- external:
metric:
name: pubsub.googleapis.com|subscription|num_undelivered_messages
selector:
matchLabels:
resource.labels.subscription_id: echo-read
target:
type: AverageValue
averageValue: 2
type: External
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: pubsub
# [END container_pubsub_workload_identity_deployment]
# [END gke_deployment_pubsub_with_workflow_identity_deployment_pubsub]
You can find the container code here
https://github.com/GoogleCloudPlatform/kubernetes-engine-samples/blob/main/databases/cloud-pubsub/main.py
import datetime
import time
# [START gke_pubsub_pull]
# [START container_pubsub_pull]
from google import auth
from google.cloud import pubsub_v1
def main():
"""Continuously pull messages from subsciption"""
# read default project ID
_, project_id = auth.default()
subscription_id = 'echo-read'
subscriber = pubsub_v1.SubscriberClient()
subscription_path = subscriber.subscription_path(
project_id, subscription_id)
def callback(message: pubsub_v1.subscriber.message.Message) -> None:
"""Process received message"""
print(f"Received message: ID={message.message_id} Data={message.data}")
print(f"[{datetime.datetime.now()}] Processing: {message.message_id}")
time.sleep(3)
print(f"[{datetime.datetime.now()}] Processed: {message.message_id}")
message.ack()
streaming_pull_future = subscriber.subscribe(
subscription_path, callback=callback)
print(f"Pulling messages from {subscription_path}...")
with subscriber:
try:
streaming_pull_future.result()
except Exception as e:
print(e)
# [END container_pubsub_pull]
# [END gke_pubsub_pull]
if __name__ == '__main__':
main()
Next create bash script called run-example.sh
PROJECT_ID=$(gcloud projects list --filter="$(gcloud config get-value project)" --format="value(PROJECT_ID)")
SERVICE_ACCOUNT_NAME=custom-metrics-stackdriver
PROJECT_NUMBER=$(gcloud projects list --filter="$(gcloud config get-value project)" --format="value(PROJECT_NUMBER)")
EXAMPLE_NAMESPACE=default
PUBSUB_TOPIC=echo
PUBSUB_SUBSCRIPTION=echo-read
create (){
kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-stackdriver/master/custom-metrics-stackdriver-adapter/deploy/production/adapter_new_resource_model.yaml
sleep 5
kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-stackdriver/master/custom-metrics-stackdriver-adapter/deploy/production/adapter_new_resource_model.yaml
# running twice to make sure its being created
echo "Created custom-metrics namespace and additional resources"
gcloud iam service-accounts create $SERVICE_ACCOUNT_NAME \
--description="custom metrics stackdriver" \
--display-name="custom-metrics-stackdriver"
echo "Created google service account(GSA) $SERVICE_ACCOUNT_NAME@$PROJECT_ID.iam.gserviceaccount.com"
sleep 5 #Sleep is because iam policy binding fails sometimes if its used to soon after service account creation
gcloud projects add-iam-policy-binding $PROJECT_ID \
--role roles/monitoring.viewer \
--member serviceAccount:$SERVICE_ACCOUNT_NAME@$PROJECT_ID.iam.gserviceaccount.com
echo "added role monitoring.viewer to GSA $SERVICE_ACCOUNT_NAME@$PROJECT_ID.iam.gserviceaccount.com"
gcloud iam service-accounts add-iam-policy-binding \
--role roles/iam.workloadIdentityUser \
--member "serviceAccount:$PROJECT_ID.svc.id.goog[custom-metrics/custom-metrics-stackdriver-adapter]" \
$SERVICE_ACCOUNT_NAME@$PROJECT_ID.iam.gserviceaccount.com
echo "added iam policy for KSA custom-metrics-stackdriver-adapter"
kubectl annotate serviceaccount --namespace custom-metrics \
custom-metrics-stackdriver-adapter \
iam.gke.io/gcp-service-account=$SERVICE_ACCOUNT_NAME@$PROJECT_ID.iam.gserviceaccount.com
echo "annotated KSA custom-metrics-stackdriver-adapter with GSA $SERVICE_ACCOUNT_NAME@$PROJECT_ID.iam.gserviceaccount.com"
gcloud pubsub topics create $PUBSUB_TOPIC
sleep 5
echo "Created Topic"
gcloud pubsub subscriptions create $PUBSUB_SUBSCRIPTION --topic=$PUBSUB_TOPIC
echo "Created Subscription to Topic"
kubectl apply -f test-app.yaml -n $EXAMPLE_NAMESPACE
echo "Deployed test application"
gcloud projects add-iam-policy-binding projects/$PROJECT_ID \
--role=roles/pubsub.subscriber \
--member=principal://iam.googleapis.com/projects/$PROJECT_NUMBER/locations/global/workloadIdentityPools/$PROJECT_ID.svc.id.goog/subject/ns/$EXAMPLE_NAMESPACE/sa/pubsub-sa
echo "Added workload identity to to pubsub-sa"
}
delete() {
kubectl delete -f test-app.yaml -n $EXAMPLE_NAMESPACE
kubectl delete -f https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-stackdriver/master/custom-metrics-stackdriver-adapter/deploy/production/adapter_new_resource_model.yaml
echo $SERVICE_ACCOUNT_NAME@$PROJECT_ID.iam.gserviceaccount.com
gcloud iam service-accounts delete $SERVICE_ACCOUNT_NAME@$PROJECT_ID.iam.gserviceaccount.com --quiet
gcloud projects remove-iam-policy-binding projects/$PROJECT_ID \
--role=roles/pubsub.subscriber \
--member=principal://iam.googleapis.com/projects/$PROJECT_NUMBER/locations/global/workloadIdentityPools/$PROJECT_ID.svc.id.goog/subject/ns/$EXAMPLE_NAMESPACE/sa/pubsub-sa
gcloud pubsub topics delete $PUBSUB_TOPIC
gcloud pubsub subscriptions delete $PUBSUB_SUBSCRIPTION
}
create
If you are prompted to enter a condition choose “None”
Confirm Application is Working
Make the application pod is running
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
pubsub-7f44cf5977-rbztk 1/1 Running 0 16h
Make sure the hpa is running
$ kubectl get pods
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
pubsub Deployment/pubsub 0/2 (avg) 1 4 1 1m
Lets trigger an auto-scale event by sending messages to the echo topic.
for i in {1..200}; do gcloud pubsub topics publish echo --message="Autoscaling #${i}"; done
It’ll take 2-5 minutes for the scaling event to occur. Yes this is slow.
After awhile you should see that the pod number has increased and that is reflected on the hpa status as well
$ kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
pubsub Deployment/pubsub 25/2 (avg) 1 4 4 74m
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
pubsub-7f44cf5977-f54hc 1/1 Running 0 25s
pubsub-7f44cf5977-gjbsh 1/1 Running 0 25s
pubsub-7f44cf5977-n7ttr 1/1 Running 0 25s
pubsub-7f44cf5977-xglct 1/1 Running 0 26s
Troubleshooting
Always check the output of run-example.sh first. Odds are you didn’t have permissions to do something. You can always run the delete command and start all over
***NOTE: you’ll need to change the name of the service account because GCP does soft deletes on service accounts.
Problems
HPA has unknown under targets.
$kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
pubsub Deployment/pubsub unknown/2 (avg) 1 4 4 64m
- The reason for this is that some configuration just went wrong. Check to make sure every command executed correctly.
- You can even check the logs from the custom-metrics pod to make sure nothing is wrong.
austin.poole@docker-and-such:~$ kubectl get pods -n custom-metrics
NAME READY STATUS RESTARTS AGE
custom-metrics-stackdriver-adapter-89fdf8645-bbn4l 1/1 Running 0 5h11m
austin.poole@docker-and-such:~$ kubectl logs custom-metrics-stackdriver-adapter-89fdf8645-bbn4l -n custom-metrics
I1127 13:52:25.333064 1 adapter.go:217] serverOptions: {true true true true false false false}
I1127 13:52:25.336266 1 adapter.go:227] ListFullCustomMetrics is disabled, which would only list 1 metric resource to reduce memory usage. Add --list-full-custom-metrics to list full metric resources for debugging.
I1127 13:52:29.127164 1 serving.go:374] Generated self-signed cert (apiserver.local.config/certificates/apiserver.crt, apiserver.local.config/certificates/apiserver.key)
....
- Make sure that the external metrics APIService exists by querying the api-server.
$ kubectl proxy --port 8080 &
Starting to serve on 127.0.0.1:8080
$ curl http://localhost:8080/apis/external.metrics.k8s.io/v1beta1
{
"kind": "APIResourceList",
"apiVersion": "v1",
"groupVersion": "external.metrics.k8s.io/v1beta1",
"resources": [
{
"name": "externalmetrics",
"singularName": "",
"namespaced": true,
"kind": "ExternalMetricValueList",
"verbs": [
"get"
]
}
]
}
If there the external metrics APIService is missing than re-run
kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-stackdriver/master/custom-metrics-stackdriver-adapter/deploy/production/adapter_new_resource_model.yaml
Thanks for taking the time to read about GCP Horizontal Pod Autoscaling with Pub/Sub.
Cheers!