What happened? We upgraded Goldilocks (tried 7.0.0, 7.1.0 and 7.1.

Relevant parameters: <div class="snippet-clipboard-content notranslate position-re

Helm values are not much different: <div class="snippet-clipboard-content notransl

[VPA] Usage of VPA helm chart >2.0.0 leads to missing recommendations about charts HOT 13 CLOSED

Pionerd commented on August 26, 2024

[VPA] Usage of VPA helm chart >2.0.0 leads to missing recommendations

from charts.

Comments (13)

Pionerd commented on August 26, 2024 1

Additionally, are you using long-term storage with prometheus to feed VPA?

Yes we use thanos

from charts.

Pionerd commented on August 26, 2024 1

Additional remark: we have multiple clients using our setup and only but all EKS clients are suffering from this, the AKS customers are not after the same upgrade.

from charts.

Pionerd commented on August 26, 2024 1

Relevant parameters:

I0816 16:14:07.067381       1 flags.go:57] FLAG: --add-dir-header="false"
I0816 16:14:07.067486       1 flags.go:57] FLAG: --address=":8942"
I0816 16:14:07.067492       1 flags.go:57] FLAG: --alsologtostderr="false"
I0816 16:14:07.067495       1 flags.go:57] FLAG: --checkpoints-gc-interval="10m0s"
I0816 16:14:07.067499       1 flags.go:57] FLAG: --checkpoints-timeout="1m0s"
I0816 16:14:07.067504       1 flags.go:57] FLAG: --container-name-label="container"
I0816 16:14:07.067509       1 flags.go:57] FLAG: --container-namespace-label="namespace"
I0816 16:14:07.067514       1 flags.go:57] FLAG: --container-pod-name-label="pod"
I0816 16:14:07.067517       1 flags.go:57] FLAG: --cpu-histogram-decay-half-life="24h0m0s"
I0816 16:14:07.067522       1 flags.go:57] FLAG: --cpu-integer-post-processor-enabled="false"
I0816 16:14:07.067526       1 flags.go:57] FLAG: --history-length="8d"
I0816 16:14:07.067531       1 flags.go:57] FLAG: --history-resolution="1h"
I0816 16:14:07.067535       1 flags.go:57] FLAG: --kube-api-burst="10"
I0816 16:14:07.067541       1 flags.go:57] FLAG: --kube-api-qps="5"
I0816 16:14:07.067547       1 flags.go:57] FLAG: --kubeconfig=""
I0816 16:14:07.067552       1 flags.go:57] FLAG: --log-backtrace-at=":0"
I0816 16:14:07.067566       1 flags.go:57] FLAG: --log-dir=""
I0816 16:14:07.067571       1 flags.go:57] FLAG: --log-file=""
I0816 16:14:07.067575       1 flags.go:57] FLAG: --log-file-max-size="1800"
I0816 16:14:07.067579       1 flags.go:57] FLAG: --logtostderr="true"
I0816 16:14:07.067584       1 flags.go:57] FLAG: --memory-aggregation-interval="24h0m0s"
I0816 16:14:07.067589       1 flags.go:57] FLAG: --memory-aggregation-interval-count="8"
I0816 16:14:07.067593       1 flags.go:57] FLAG: --memory-histogram-decay-half-life="24h0m0s"
I0816 16:14:07.067597       1 flags.go:57] FLAG: --memory-saver="false"
I0816 16:14:07.067601       1 flags.go:57] FLAG: --metric-for-pod-labels="kube_pod_labels{job=\"kube-state-metrics\"}[8d]"
I0816 16:14:07.067605       1 flags.go:57] FLAG: --min-checkpoints="10"
I0816 16:14:07.067609       1 flags.go:57] FLAG: --one-output="false"
I0816 16:14:07.067613       1 flags.go:57] FLAG: --oom-bump-up-ratio="1.2"
I0816 16:14:07.067618       1 flags.go:57] FLAG: --oom-min-bump-up-bytes="1.048576e+08"
I0816 16:14:07.067623       1 flags.go:57] FLAG: --pod-label-prefix=""
I0816 16:14:07.067627       1 flags.go:57] FLAG: --pod-name-label="pod"
I0816 16:14:07.067631       1 flags.go:57] FLAG: --pod-namespace-label="namespace"
I0816 16:14:07.067635       1 flags.go:57] FLAG: --pod-recommendation-min-cpu-millicores="5"
I0816 16:14:07.067640       1 flags.go:57] FLAG: --pod-recommendation-min-memory-mb="25"
I0816 16:14:07.067645       1 flags.go:57] FLAG: --prometheus-address="http://thanos-query-frontend.prometheus-stack:9090"
I0816 16:14:07.067649       1 flags.go:57] FLAG: --prometheus-cadvisor-job-name="kubelet"
I0816 16:14:07.067653       1 flags.go:57] FLAG: --prometheus-query-timeout="5m"
I0816 16:14:07.067657       1 flags.go:57] FLAG: --recommendation-margin-fraction="0.15"
I0816 16:14:07.067662       1 flags.go:57] FLAG: --recommender-interval="1m0s"
I0816 16:14:07.067667       1 flags.go:57] FLAG: --recommender-name="default"
I0816 16:14:07.067671       1 flags.go:57] FLAG: --skip-headers="false"
I0816 16:14:07.067675       1 flags.go:57] FLAG: --skip-log-headers="false"
I0816 16:14:07.067679       1 flags.go:57] FLAG: --stderrthreshold="2"
I0816 16:14:07.067683       1 flags.go:57] FLAG: --storage="prometheus"
I0816 16:14:07.067686       1 flags.go:57] FLAG: --target-cpu-percentile="0.9"
I0816 16:14:07.067690       1 flags.go:57] FLAG: --v="10"
I0816 16:14:07.067693       1 flags.go:57] FLAG: --vmodule=""
I0816 16:14:07.067697       1 flags.go:57] FLAG: --vpa-object-namespace=""
I0816 16:14:07.067702       1 main.go:82] Vertical Pod Autoscaler 0.13.0 Recommender: 0xc00004d820

Full logs in your mail :) not to leak any sensitive info here.

from charts.

Pionerd commented on August 26, 2024 1

Helm values are not much different:

vpa:
  recommender:
    extraArgs:
      storage: "prometheus"
      # The prometheus_server_endpoint should have the form http://<service-name>.<namespace-name>.svc:portnumber
      prometheus-address: "http://thanos-query-frontend.prometheus-stack:9090"
      prometheus-cadvisor-job-name: kubelet
      pod-label-prefix: ""
      pod-namespace-label: namespace
      pod-name-label: pod
      container-pod-name-label: pod
      container-name-label: container
      metric-for-pod-labels: kube_pod_labels{job="kube-state-metrics"}[8d]
      pod-recommendation-min-cpu-millicores: 5
      pod-recommendation-min-memory-mb: 25
      v: 10
  updater:
    enabled: false
  admissionController:
    enabled: false

from charts.

sudermanjr commented on August 26, 2024

How are you pulling these metrics into Grafana? Is it possible there's actually just an issue with the metrics reporting rather than the actual VPA recommendation itself? The changes from 1.7.5 to 2.x are almost entirely unrelated to the recommender deployment itself.

from charts.

sudermanjr commented on August 26, 2024

Additionally, are you using long-term storage with prometheus to feed VPA?

from charts.

Pionerd commented on August 26, 2024

We use kube-state-metrics to scrape the VPA recommendations. The values in the Grafana dashboard are the same as when checking using kubectl get vpa.

I also cannot understand why this change would lead to this behaviour. You have not seen anything like this before?

from charts.

sudermanjr commented on August 26, 2024

The only time I've seen erratic recommendations is when I'm not using Prometheus data to feed the recommendations and I don't wait long enough for VPA to generate a good recommendation. Here's a cluster with 53 VPAs, using prometheus data, and the latest chart. (also using kube-state-metrics to poll the VPA data)

from charts.

sudermanjr commented on August 26, 2024

Maybe try turning the log level on the recommender up to 10?

from charts.

sudermanjr commented on August 26, 2024

I just realized the cluster that I'm showing in that graph above uses the vpa 0.14.0 image. Perhaps there's a bugfix in that version. Worth trying.

It would help if you could share your exact values for me to try to reproduce the issue

from charts.

sudermanjr commented on August 26, 2024

Aha. You're using uncappedTarget which does not respect limits set on the VPA or in the defaults

kubernetes/autoscaler#2747 (comment)

Uncapped Target gives the recommendation before applying constraints specified in the VPA spec, such as min or max.

I would imagine that switching that metrics to target would provide more consistent data (that's what my graph above uses)

from charts.

Pionerd commented on August 26, 2024

That was just the first graph being shown by Grafana :) similar images for Target:

from charts.

sudermanjr commented on August 26, 2024

Well now I'm at a loss. Perhaps the VPA folks can help explain why the recommendation status would oscillate so much. I personally haven't seen it do this in my various tests.

I'm guessing that the actual chart change actually has nothing to do with it, but it's something that is triggered by the re-deploy of the VPA pods. But that's just a hunch.

from charts.

[VPA] Usage of VPA helm chart >2.0.0 leads to missing recommendations about charts HOT 13 CLOSED

Comments (13)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent