Comments (6)
Capturing here a valuable comment from @piotrmiskiewicz, from internal Slack:
In my opinion IM should have something like priorities. If the GardenerCluster is new it should be processed as soon as possible. If the Kubeconfig must be rotated - it can wait few minutes. Please analyze why there is so much GardenerCluster to process, because changing the timeout from 2 to 3 minutes could also be not enough
I think we should somehow first understand more how increased load impacts the reconciliation time and the above could be a way to solve the issue.
from infrastructure-manager.
About load impacts - you could also consider some random values to avoid such peaks. I don't know the reason why there was such high load, but I can imagine, than randomized "rotation time" could decrease the peak.
from infrastructure-manager.
Yip, we had similar problem of load-peaks in the reconciler. Adding a jitter helped to distribute the load over time (e.g. https://github.com/octo/retry/blob/master/jitter.go or a sample snippet from reconciler)
from infrastructure-manager.
This issue or PR has been automatically marked as stale due to the lack of recent activity.
Thank you for your contributions.
This bot triages issues and PRs according to the following rules:
- After 60d of inactivity,
lifecycle/stale
is applied - After 7d of inactivity since
lifecycle/stale
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Close this issue or PR with
/close
If you think that I work incorrectly, kindly raise an issue with the problem.
/lifecycle stale
from infrastructure-manager.
The IM was working on ~420 CRs and we have to still perform tests on ~1000
and ~5000
CRs.
from infrastructure-manager.
On 24.01.2024
we've faced situation on PROD where it took 141
seconds to Infrastructure-manager to rotate the certificate since Gardener Cluster
CR creation. (internal issue reference - no. 5012).
from infrastructure-manager.
Related Issues (20)
- Setup release pipeline with SREs to ship KIM to KCP
- Migrated shoot differs from the original one
- Shoot produced by KIM differs from the one created by Provisioner
- Runtime is Failed even if Gardener Shoot is "in progress" HOT 8
- [Threat Modelling] Ensure retry-logic applied for exceptional situations HOT 2
- KIM: Exploratory testing HOT 1
- Making the processing more resilient HOT 2
- [Threat Modeling] Improve secure configuration of KIM
- KIM supports flag for making the auditlog-configuration mandatory HOT 3
- KIM adds a condition to RuntimeCRs which provide details about the applied auditlog configuration HOT 3
- POC for multiple worker pools HOT 4
- KIM supports configuring kube-controller-manager params
- Deal with recoverable errors in Gardener Shoot spec
- Operational Awareness: review usage of logger in KIM to ensure well formatted and expressive log-messages and appropriate log-levels
- Operational awareness: create troubleshooting guides for KIM to cover cluster provisioning logic
- Operational Awareness: Review the Kubernetes readiness probe and liveness probe of KIM
- Implement mutating webhook to inject node affinity for Kyma workload
- Support change of loglevel per Runtime without restarting KIM
- HASI: Limit permissions to read only relevant secrets by KIM
- KIM:add mechanism to prevent incompatible migration of Shoot-Spec
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from infrastructure-manager.