Example local dev setup with Prometheus and custom metrics

This example uses the local Rig development environment to setup a Prometheus stack emitting custom metrics used to autoscale a capsule.

Using the Rig cli you can spin up a Kind cluster with Rig installed for local development. This can also be spun up with a prometheus stack

rig dev kind create --prometheus

After the command has finished you should be able to see a lot of Prometheus resources created in the prometheus namespace

kubectl get all -n prometheus

should output something like

NAME                                                            READY   STATUS    RESTARTS   AGE
pod/alertmanager-kube-prometheus-stack-alertmanager-0           2/2     Running   0          125m
pod/kube-prometheus-stack-grafana-866dd87656-tw4xr              3/3     Running   0          125m
pod/kube-prometheus-stack-kube-state-metrics-5c68dd7f45-c2hx5   1/1     Running   0          125m
pod/kube-prometheus-stack-operator-7db8f987bb-ntrwt             1/1     Running   0          125m
pod/kube-prometheus-stack-prometheus-node-exporter-6dhs6        1/1     Running   0          125m
pod/prometheus-adapter-59d75fbb56-pdwgt                         1/1     Running   0          125m
pod/prometheus-kube-prometheus-stack-prometheus-0               2/2     Running   0          125m

...

Using the --prometheus flag also sets the prometheusServiceMonitor setting on the Rig operator so it spawns a ServiceMonitor per capsule monitoring ports named metrics. The Prometheus resource created by rig dev kind create is configured to select ServiceMonitors with the rig.dev/capsule label set (which the operator always sets).

kubectl get prometheus kube-prometheus-stack-prometheus -n prometheus -oyaml | yq '.spec.serviceMonitorSelector'

should output

matchExpressions:
  - key: rig.dev/capsule
    operator: Exists

To test the custom metrics we are going to make at setup with two capsule, a producer and a consumer. The produer capsule sends periodically a request to the consumer. For each request, the consumer incrementes a queue counter and asynchronously decrements the queue counter once every 0.5-1.5s. This is to simulate a set of consumers each with their own queue of tasks they will process. The consumers emit their queue size as a metric we will horizontally scale them with. If the queues get too big it's indicative that there are too few consumers.

Create two capsules like so

rig capsule create consumer
rig capsule create producer

Then deploy the following consumer/producer image to them (the code, for the interested)

rig deploy consumer --image almostuseful/consumer
rig deploy producer --image almostuseful/producer

At this point we should be able to see two ServiceMonitors (one for each capsule) created in the kubernetes namespace of the Rig project (called demo in this case).

kubectl get servicemonitor -n demo

outputs

NAME       AGE
consumer   12m
producer   12m

The consumer capsule has not exposed a port yet which we need to do. Store

interfaces:
    - port: 2112
      name: metrics

in a file, e.g. network.yaml and deploy it to the capsule

rig deploy consumer --set-network-interface network.yaml

We use port 2112 both for Prometheus metric emission and the endpoint which the producer calls on the consumer. The name of the port, metrics, is not arbitrary as the ServiceMonitors are setup to monitor endpoints called metrics. Run

kubectl get servicemonitor consumer -n demo -oyaml | yq '.spec'

to verify. With this we should be able to see Prometheus pick up a metric queue_size from the consumer. You can port-forward the port which Prometheus exposes

kubectl port-forward pod/prometheus-kube-prometheus-stack-prometheus-0 -n prometheus 9090:9090

and then go to localhost:9090 and query for queue_size. Metric query

Although the metric is now picked up by Prometheus, it has yet to be picked up by the Prometheus Adapter so it can be consumed through the custom.metrics.k8s.io API. We need to configure some rules for the prometheus adapter to pick up our queue_size metric. Save the following minimal config in a file, e.g. adapter-config.yaml and apply it

apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-adapter
data:
  config.yaml: |
    rules:
    - seriesQuery: 'queue_size'
      metricsQuery: <<.Series>>

kubectl apply -f adapter-config.yaml -n prometheus

It can take some minutes for the adapter to scrape the metrics, but it should find the queue_size metric. We are now ready to setup the autoscaler to listen on the custom metrics. The following configuration enables the autoscaler and it will try to maintain that the average queue_length doesn't exceed 5

maxReplicas: 5
minReplicas: 1
customMetrics:
- instance:
    averageValue: "5"
    metricName: queue_size

rig capsule scale horizontal autoscale consumer --path autoscalerconfig.yaml

After this deployment we should be able to see the autoscaler operating and fetching values from the custom metric

kubectl get horizontalpodautoscaler -n demo

NAME       REFERENCE             TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
consumer   Deployment/consumer   4/5       1         5         1          96s

Finally, to see the autoscaling kick in we need to up the frequency with which the producer sends requests to the consumers. The default is 1 second, but we can increase it by setting an environment variable REQUEST_TIME on the producer

rig deploy producer --set-env-var REQUEST_TIME=250ms

An easy way to quickly inspect the queue size of each consumer is to look at the logs of the capsule. For each element a consumer 'pops' from its queue, it prints the queue size

rig capsule logs consumer ---follow

consumer-8b76d9989-h8w84 2023-12-06T14:32:23.160167215Z: Queue size 29
consumer-8b76d9989-h8w84 2023-12-06T14:32:24.611777882Z: Queue size 28
consumer-8b76d9989-h8w84 2023-12-06T14:32:25.288842716Z: Queue size 27
consumer-8b76d9989-h8w84 2023-12-06T14:32:26.687771008Z: Queue size 28
consumer-8b76d9989-h8w84 2023-12-06T14:32:27.630559634Z: Queue size 27
consumer-8b76d9989-nzxcf 2023-12-06T14:32:28.288782051Z: Queue size 0
consumer-8b76d9989-gz7mm 2023-12-06T14:32:28.508244509Z: Queue size 1
consumer-8b76d9989-h8w84 2023-12-06T14:32:29.105351051Z: Queue size 28
consumer-8b76d9989-h8w84 2023-12-06T14:32:29.614929968Z: Queue size 28
consumer-8b76d9989-zlfq6 2023-12-06T14:32:29.825761468Z: Queue size 2
consumer-8b76d9989-gz7mm 2023-12-06T14:32:29.994360552Z: Queue size 1
consumer-8b76d9989-zlfq6 2023-12-06T14:32:30.356317593Z: Queue size 1
consumer-8b76d9989-h8w84 2023-12-06T14:32:30.592397135Z: Queue size 27

At the start we have only one consumer with a large queue. This prompts the autoscaler to spin up more instances which then can take on requests, easing the load on the first instance.

This walkthrough shows how you can setup Prometheus and the Adapter to work well with Rig to enable autoscaling on custom metrics. Of course, if you have a different metric setup, Rig can still spin up an autoscaler which monitors for custom metrics.