Code Monkey home page Code Monkey logo

Comments (6)

Disper avatar Disper commented on September 27, 2024 1

Capturing here a valuable comment from @piotrmiskiewicz, from internal Slack:

In my opinion IM should have something like priorities. If the GardenerCluster is new it should be processed as soon as possible. If the Kubeconfig must be rotated - it can wait few minutes. Please analyze why there is so much GardenerCluster to process, because changing the timeout from 2 to 3 minutes could also be not enough

I think we should somehow first understand more how increased load impacts the reconciliation time and the above could be a way to solve the issue.

from infrastructure-manager.

piotrmiskiewicz avatar piotrmiskiewicz commented on September 27, 2024 1

About load impacts - you could also consider some random values to avoid such peaks. I don't know the reason why there was such high load, but I can imagine, than randomized "rotation time" could decrease the peak.

from infrastructure-manager.

tobiscr avatar tobiscr commented on September 27, 2024 1

Yip, we had similar problem of load-peaks in the reconciler. Adding a jitter helped to distribute the load over time (e.g. https://github.com/octo/retry/blob/master/jitter.go or a sample snippet from reconciler)

from infrastructure-manager.

kyma-bot avatar kyma-bot commented on September 27, 2024

This issue or PR has been automatically marked as stale due to the lack of recent activity.
Thank you for your contributions.

This bot triages issues and PRs according to the following rules:

  • After 60d of inactivity, lifecycle/stale is applied
  • After 7d of inactivity since lifecycle/stale was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Close this issue or PR with /close

If you think that I work incorrectly, kindly raise an issue with the problem.

/lifecycle stale

from infrastructure-manager.

Disper avatar Disper commented on September 27, 2024

The IM was working on ~420 CRs and we have to still perform tests on ~1000 and ~5000 CRs.

from infrastructure-manager.

Disper avatar Disper commented on September 27, 2024

On 24.01.2024 we've faced situation on PROD where it took 141 seconds to Infrastructure-manager to rotate the certificate since Gardener Cluster CR creation. (internal issue reference - no. 5012).

from infrastructure-manager.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.