Code Monkey home page Code Monkey logo

Comments (5)

wosiu avatar wosiu commented on August 17, 2024

[Disclaimer I'm not a maintainer, but..] AFAIK - no, there is no such mechanism on the plugin side. In fact ASG should not control any operations which might cause removing/restarting an agent, because it might be in the middle of the job.

The way we handle that, is to set both in the plugin's configuration:

  1. Max Idle Minutes Before Scaledown
  2. Maximum Total Uses

So it's kinda heuristic, but it works in our case, because machines are eventually rotated because of one of those 2 configurations above (pretty quickly depending on the numbers you set). And a new machine comes with a new ami. If you need something more prompt, I guess you can implement a script to be run in jenkins console (or scriptler), that marks agents offline, so that they finish their current tasks, but do not accept more work.

And there is a similar information in the faq:

Q: Why does the plugin keep enabling scale-in protection on my ASG?
A: The plugin handles termination of instances manually based on idle period settings. Without scale-in protection enabled, instances could be terminated unexpectedly by external conditions and running jobs could be interrupted.

Having all that said, it would be very hard to make the native ASG feature for rolling new AMI to work with the plugin in my opinion. The control plane here must be a jenkins. I can imagine jenkins asking aws periodically if there is a new ami configured for a given ASG, and then perfoming machines rotation, but it won't use the builtin AWS feature.

from ec2-fleet-plugin.

gnydick avatar gnydick commented on August 17, 2024

I second the request to support the swarm plugin. It just makes things so much easier if the nodes can connect themselves. There are no credentials or remote management to deal with from the Jenkins controller side. I agree, I don't want my AWS ASG handling scaling. I just would like the ec2-fleet plugin to be able to manage the fleet and be able to identify the swarm created nodes.

from ec2-fleet-plugin.

michalszelagsonos avatar michalszelagsonos commented on August 17, 2024

To clarify my request, I do think that having Jenkins be in control of the scaling makes most sense. It understands the load, it knows whether a node is running a workload, it is in the best position to scale in and out when needed. Here is what is driving my questions, which is probably not a typical environment. I have a fleet of MacOS AWS nodes that Jenkins is managing as nodes. I'd like this pool to have some elasticity and I want to manage the nodes using immutable infrastructure (AMI). Note, due to Apple's licensing terms, a node has to remain up for at least 24 hours before it can be shut down. ASGs support Mac nodes now which opens the door for some pretty powerful patterns. So, what I'd love to see is as follows:

  • the plugin scales in and out based on load using existing policy. Since Jenkins has affinity for using nodes that were last used for a job, I believe scale-in based on idle time may be enough to manage this, although another criteria based on age would be wonderful here. If I could scale in after min 24hours and idle time, that would be very nice.
  • AMI refresh is a critical feature when it comes to managing the infrastructure, especially when the lifecycle of a node has some longevity to it. I think it would be great to consider some kind of feature in the plugin to facilitate this. Based on the diagram in the docs, the plugin already knows how to discover new nodes, I am guessing that it is using ASG API to scale out, initiating an auto refresh is an ASG API call, it doesn't "seem" that crazy to think a controller initiated refresh is possible. Note, I didn't go through the plugin code so I am making assumptions based on current capabilities and what AWS API supports.

Note, the ASG lifecycle hooks offer some opportunities here to synchronize node activity with the ASG. An alternative approach here could be to use the swarm plugin, let the nodes attach dynamically using the agent provided by that plugin but the problem still exists of how to spin nodes up and down without blowing up the workloads that could be running on it at the time. A lifecycle hook can help, although I'd love to avoid this. In the hook, you can pause the spin down, let the job finish, and then resume. It would probably involve attaching compute to the event notifications, like Lambda, and making sure it can communicate with the controller, and send API calls to it. It's doable, but I don't love this solution. This plugin would be a great solution to this problem.

from ec2-fleet-plugin.

github-actions avatar github-actions commented on August 17, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If you want this issue to never become stale, please ask a maintainer to apply the "stalebot-ignore" label.

from ec2-fleet-plugin.

github-actions avatar github-actions commented on August 17, 2024

This issue was closed because it has become stale with no activity.

from ec2-fleet-plugin.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.