1

Is there any possibility to autoscale deployment with Openshift Origin 1.5.0 (kubernetes 1.5.2) and use custom metrics for this purpose?

Kubernetes documentation states that autoscaling with custom metrics are being supported from version 1.2. It looks true, just because Openshift horizontal pod autoscaler (HPA) tries to gain some metrics and calculate desired metrics. But my configuration fails to succeed to perform this. Guys, please help me with finding what I am doing wrong with this.

So, what happens:

  • I have set up a metrics as it is recommended in Origin latest docs (all steps are passed): https://docs.openshift.org/latest/install_config/cluster_metrics.html;

    • I have an app, which is being deployed with Deployment kind object;
    • this app exposes custom metrics with http json endpoint;
    • custom metrics are being collected and stored - this is shown in Openshift origin UI in Metrics tab of corresponding pod;
    • after I create HPA - some warning about collecting custom metrics appear, it writes something like 'Failed collecting custom metrics, did not recieve metrics for any ready pods';
    • I create HPA with API version 1 and include an annotation alpha/target.custom-metrics.podautoscaler.kubernetes.io: '{"items":[{"name":"requests_count", "value": "10"}]}';
    • if I request a deployed heapster app through master-proxy, I receive something like this

      { "metadata": {}, "items": [ { "metadata": { "name": "resty-1722683747-kmbw0", "namespace": "availability-demo", "creationTimestamp": "2017-05-24T09:50:24Z" }, "timestamp": "2017-05-24T09:50:00Z", "window": "1m0s", "containers": [ { "name": "resty", "usage": { "cpu": "0", "memory": "2372Ki" } } ] } ] }

    • as you can see, there is really no custom metrics, and my custom metrics is named requests_count.

What steps should I take to succeed in implementing custom metrics autoscaling?

Screenshot with custom metrics being collected and exposed via Openshift Console UI

UPDATE: In openshift master log warning looks like this:

I0524 10:17:47.537985       1 panics.go:76GET /apis/extensions/v1beta1/namespaces/availability-demo/deployments/resty/scale: (3.379724ms) 200 [[openshift/v1.5.2+43a9be4 (linux/amd64) kubernetes/43a9be4 system:serviceaccount:openshift-infra:hpa-controller] 10.105.8.81:33945]
I0524 10:17:47.543354       1 panics.go:76] GET /api/v1/proxy/namespaces/openshift-infra/services/https:heapster:/apis/metrics/v1alpha1/namespaces/availability-demo/pods?labelSelector=app%3Dresty: (4.830135ms) 200 [[openshift/v1.5.2+43a9be4 (linux/amd64) kubernetes/43a9be4 system:serviceaccount:openshift-infra:hpa-controller] 10.105.8.81:33945]
I0524 10:17:47.553255       1 panics.go:76] GET /api/v1/namespaces/availability-demo/pods?labelSelector=app%3Dresty: (8.864864ms) 200 [[openshift/v1.5.2+43a9be4 (linux/amd64) kubernetes/43a9be4 system:serviceaccount:openshift-infra:hpa-controller] 10.105.8.81:33945]
I0524 10:17:47.559909       1 panics.go:76] GET /api/v1/namespaces/availability-demo/pods?labelSelector=app%3Dresty: (5.725342ms) 200 [[openshift/v1.5.2+43a9be4 (linux/amd64) kubernetes/43a9be4 system:serviceaccount:openshift-infra:hpa-controller] 10.105.8.81:33945]
I0524 10:17:47.560977       1 panics.go:76] PATCH /api/v1/namespaces/availability-demo/events/resty.14c14bbf8b89534c: (6.385846ms) 200 [[openshift/v1.5.2+43a9be4 (linux/amd64) kubernetes/43a9be4 system:serviceaccount:openshift-infra:hpa-controller] 10.105.8.81:33945]
I0524 10:17:47.565418       1 panics.go:76] GET /api/v1/proxy/namespaces/openshift-infra/services/https:heapster:/api/v1/model/namespaces/availability-demo/pod-list/resty-1722683747-kmbw0/metrics/custom/requests_count?start=2017-05-24T10%3A12%3A47Z: (5.015336ms) 200 [[openshift/v1.5.2+43a9be4 (linux/amd64) kubernetes/43a9be4 system:serviceaccount:openshift-infra:hpa-controller] 10.105.8.81:33945]
I0524 10:17:47.569843       1 panics.go:76] GET /api/v1/namespaces/availability-demo/pods?labelSelector=app%3Dresty: (4.040029ms) 200 [[openshift/v1.5.2+43a9be4 (linux/amd64) kubernetes/43a9be4 system:serviceaccount:openshift-infra:hpa-controller] 10.105.8.81:33945]
I0524 10:17:47.575530       1 panics.go:76] PUT /apis/autoscaling/v1/namespaces/availability-demo/horizontalpodautoscalers/resty/status: (4.894835ms) 200 [[openshift/v1.5.2+43a9be4 (linux/amd64) kubernetes/43a9be4 system:serviceaccount:openshift-infra:hpa-controller] 10.105.8.81:33945]
I0524 10:17:47.575856       1 horizontal.go:438] Successfully updated status for resty
W0524 10:17:47.575890       1 horizontal.go:104] Failed to reconcile resty: failed to compute desired number of replicas based on Custom Metrics for Deployment/availability-demo/resty: failed to get custom metric value: did not recieve metrics for any ready pods

UPDATE: Found what request HPA issues to heapster through proxy to gather custom metrics. This requests always return empty metrics array:

GET /api/v1/proxy/namespaces/openshift-infra/services/https:heapster:/api/v1/model/namespaces/availability-demo/pod-list/availability-example-1694583826-55hqh/metrics/custom/requests_count?start=2017-05-25T13%3A14%3A24Z HTTP/1.1
Host: kubernetes-master:8443
Authorization: Bearer hpa-agent-token

And it returns

{"items":[{"metrics":[],"latestTimestamp":"0001-01-01T00:00:00Z"}]}

UPDATE: It turns out, that HPA requests heapster through proxy, and heapster - in its turn - request "summary" kubernetes api. Then the question is - why kubernetes "summary" api does not answer with metrics for above mentioned request, though the metrics exist?

1 Answers1

0

Might be a wild guess but I had the issue myself on a self made cluster, the 2 things I ran into were token issues where certificate of my HA master setup was not set-up correctly and another issue was regarding my kubedns. Not sure if this is applicable for openshitf.

jonas kint
  • 41
  • 4
  • I would like to answer that there is some authorization issue, this would be better=) But unfortunately no clients/servers report that they cannot check any certificate chain or resolve any name - and even all requests are being processed on master. Any how thanks for you response. The issue is currently actual - and I will try to resolve this. – Aleksey Gvozdev Jun 07 '17 at 09:01