Deion We should verify how the operator beha

Capturing here a valuable comment from <a class="user-mention notranslate" data-hoverc

Perform load and stress test to verify operator's behaviour under load about infrastructure-manager HOT 6 OPEN

akgalwas commented on September 27, 2024

Perform load and stress test to verify operator's behaviour under load

from infrastructure-manager.

Comments (6)

Disper commented on September 27, 2024 1

Capturing here a valuable comment from @piotrmiskiewicz, from internal Slack:

In my opinion IM should have something like priorities. If the GardenerCluster is new it should be processed as soon as possible. If the Kubeconfig must be rotated - it can wait few minutes. Please analyze why there is so much GardenerCluster to process, because changing the timeout from 2 to 3 minutes could also be not enough

I think we should somehow first understand more how increased load impacts the reconciliation time and the above could be a way to solve the issue.

from infrastructure-manager.

piotrmiskiewicz commented on September 27, 2024 1

About load impacts - you could also consider some random values to avoid such peaks. I don't know the reason why there was such high load, but I can imagine, than randomized "rotation time" could decrease the peak.

from infrastructure-manager.

tobiscr commented on September 27, 2024 1

Yip, we had similar problem of load-peaks in the reconciler. Adding a jitter helped to distribute the load over time (e.g. https://github.com/octo/retry/blob/master/jitter.go or a sample snippet from reconciler)

from infrastructure-manager.

kyma-bot commented on September 27, 2024

This issue or PR has been automatically marked as stale due to the lack of recent activity.
Thank you for your contributions.

This bot triages issues and PRs according to the following rules:

After 60d of inactivity, lifecycle/stale is applied
After 7d of inactivity since lifecycle/stale was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Close this issue or PR with /close

If you think that I work incorrectly, kindly raise an issue with the problem.

/lifecycle stale

from infrastructure-manager.

Disper commented on September 27, 2024

The IM was working on ~420 CRs and we have to still perform tests on ~1000 and ~5000 CRs.

from infrastructure-manager.

Disper commented on September 27, 2024

On 24.01.2024 we've faced situation on PROD where it took 141 seconds to Infrastructure-manager to rotate the certificate since Gardener Cluster CR creation. (internal issue reference - no. 5012).

from infrastructure-manager.

Recommend Projects

Perform load and stress test to verify operator's behaviour under load about infrastructure-manager HOT 6 OPEN

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent