Code Monkey home page Code Monkey logo

ec2-fleet-plugin's Introduction

EC2 Fleet Plugin for Jenkins

The EC2 Fleet Plugin scales your Auto Scaling Group, EC2 Fleet, or Spot Fleet automatically for your Jenkins workload.

Build Status Gitter Jenkins Plugin Installs

Overview

The EC2 Fleet Plugin scales your Auto Scaling Group, EC2 Fleet, or Spot Fleet automatically for your Jenkins workload. It handles launching new instances that match the criteria set in your ASG, EC2 Fleet, or Spot Fleet e.g. allocation strategy, and terminating idle instances that breach that criteria or those in your Jenkins Cloud configuration.

Warning

AWS strongly discourages using the SpotFleets because it is now categorized as legacy API with no planned investment. Use Auto Scaling Groups instead.

Features

Minimum Jenkins version: 2.277.2

Note

Jenkins version 2.403 includes significant changes to cloud management. If you are using that version, and see unexpected behavior, create an issue and let us know.

  • Supports EC2 Fleet, Spot Fleet, or Auto Scaling Group as Jenkins Workers
  • Supports all features provided by EC2 Fleet, Spot Fleet, or Auto Scaling Groups e.g. multiple instance types across Spot and On-Demand
  • Auto resubmit failed jobs caused by Spot interruptions
  • No delay scale up strategy: enable No Delay Provision Strategy in configuration
  • Add tags to EC2 instances used by plugin, for easy search, tag format ec2-fleet-plugin:cloud-name=<MyCloud>
  • Allow custom EC2 API endpoint
  • Auto Fleet creation based on Job label (details)
  • Set a maximum total uses to terminate nodes after running the set number of jobs
  • Set a minimum spare size to keep nodes ready for incoming jobs, even when idle
  • Default unique cloud names for UI users (shown as a suggestion in the form) and JCasC users (name: "" will signal the plugin to generate a default name)

Comparison to EC2-Plugin

EC2-Plugin is a similar Jenkins plugin that will request EC2 instances when excess workload is detected. The main difference between the two plugins is that EC2-Fleet-Plugin uses ASG, EC2 Fleet, and Spot Fleet to request and manage instances instead of doing it manually with EC2 RunInstances. This gives EC2-Fleet-Plugin all the benefits of ASG, EC2 Fleet, and Spot Fleet: allocation strategies, automatic availability zone re-balancing (ASG only), access to launch templates and launch configurations , instance weighting, etc. See which-spot-request-method-to-use.

EC2-Fleet-Plugin EC2-Plugin
Supports On-Demand & Spot Instances Supports On-Demand & Spot Instances
Scales with ASG, EC2 Fleet, or Spot Fleet Scales with RunInstances
ASG, EC2 Fleet, and Spot Fleet Allocation Strategies No Allocation Strategies
Use launch template/config to set instance settings Manually set instances settings within plugin
Custom instance weighting No custom instance weighting
Supports mixed configuration like instance types, purchase options Supports single instance type only

Change Log

This plugin is using SemVersion which means that each plugin version looks like

<major>.<minor>.<bugfix>

major = increase only if non back compatible changes
minor = increase when new features
bugfix = increase when bug fixes

As a result, you can safely update the plugin to any version until the first number is different than what you have.

Releases: https://github.com/jenkinsci/ec2-fleet-plugin/releases

Usage

Setup

1. Create AWS Account

Go to AWS account and follow instructions.

2. Create IAM User

Specify programmatic access during creation and record the credentials. These will be used by Jenkins EC2 Fleet Plugin to connect to your EC2 Fleet or Spot Fleet.

Alternatively, you may use AWS EC2 instance roles

3. Configure User permissions

Add an inline policy to the IAM user or EC2 instance role to allow it to use EC2 Fleet, Spot Fleet, and Auto Scaling Group. AWS documentation about this

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "ec2:DescribeSpotFleetInstances",
        "ec2:ModifySpotFleetRequest",
        "ec2:CreateTags",
        "ec2:DescribeRegions",
        "ec2:DescribeInstances",
        "ec2:TerminateInstances",
        "ec2:DescribeInstanceStatus",
        "ec2:DescribeSpotFleetRequests",
        "ec2:DescribeFleets",
        "ec2:DescribeFleetInstances",
        "ec2:ModifyFleet",
        "ec2:DescribeInstanceTypes"
      ],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "autoscaling:DescribeAutoScalingGroups",
        "autoscaling:UpdateAutoScalingGroup"
      ],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "iam:ListInstanceProfiles",
        "iam:ListRoles"
      ],
      "Resource": [
        "*"
      ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "iam:PassRole"
      ],
      "Resource": [
        "*"
      ],
      "Condition": {
        "StringEquals": {
          "iam:PassedToService": [
            "ec2.amazonaws.com",
            "ec2.amazonaws.com.cn"
          ]
        }
      }
    }
  ]
}

4. Create an Auto Scaling Group, EC2 Fleet, or Spot Fleet

https://docs.aws.amazon.com/cli/latest/reference/autoscaling/create-auto-scaling-group.html Here is a getting started tutorial for ASG.

Make sure that you:

  • Specify an SSH key that will be used later by Jenkins.
  • Follow Spot best practices, if using Spot.

Warning

AWS strongly discourages using the SpotFleets because it is now categorized as legacy API with no planned investment. Use Auto Scaling Groups instead. Spot Fleet documentation

5. Configure Jenkins

Once your ASG, EC2 Fleet, or Spot Fleet is ready, you can use it by adding a new EC2 Fleet cloud in the Jenkins configuration.

  1. Goto Manage Jenkins > Plugin Manager
  2. Install EC2 Fleet Jenkins Plugin
  3. Goto Manage Jenkins > Configure Clouds
  4. Click Add a new cloud and select Amazon EC2 Fleet
  5. Configure AWS credentials, or leave empty to use the EC2 instance role
  6. Specify Auto Scaling Group, EC2 Fleet, or Spot Fleet to use

More information on the configuration options can be found here.

Scaling

You can specify the scaling limits in your cloud settings. By default, Jenkins will try to scale the fleet up if there are enough tasks waiting in the build queue and scale down idle nodes after a specified idleness period.

You can use the History tab in the AWS console to view the scaling history.

Groovy

Below is a Groovy script to setup Spot Fleet Plugin for Jenkins and configure it. You can run the script with Jenkins Script Console.

import com.amazonaws.services.ec2.model.InstanceType
import com.cloudbees.jenkins.plugins.sshcredentials.impl.BasicSSHUserPrivateKey.DirectEntryPrivateKeySource
import com.cloudbees.jenkins.plugins.sshcredentials.impl.BasicSSHUserPrivateKey
import com.cloudbees.jenkins.plugins.awscredentials.AWSCredentialsImpl
import hudson.plugins.sshslaves.SSHConnector
import hudson.plugins.sshslaves.verifiers.NonVerifyingKeyVerificationStrategy
import com.cloudbees.plugins.credentials.*
import com.cloudbees.plugins.credentials.domains.Domain
import hudson.model.*
import com.amazon.jenkins.ec2fleet.EC2FleetCloud
import jenkins.model.Jenkins

// just modify this config other code just logic
config = [
    region: "us-east-1",
    // Spot Fleet ID, EC2 Fleet ID, or Auto Scaling Group Name
    fleetId: "...", 
    idleMinutes: 10,
    minSize: 0,
    maxSize: 10,
    numExecutors: 1,
    awsKeyId: "...",
    secretKey: "...",
    ec2PrivateKey: '''-----BEGIN RSA PRIVATE KEY-----
...
-----END RSA PRIVATE KEY-----'''
]

// https://github.com/jenkinsci/aws-credentials-plugin/blob/aws-credentials-1.23/src/main/java/com/cloudbees/jenkins/plugins/awscredentials/AWSCredentialsImpl.java
AWSCredentialsImpl awsCredentials = new AWSCredentialsImpl(
  CredentialsScope.GLOBAL,
  "aws-credentials",
  config.awsKeyId,
  config.secretKey,
  "my aws credentials"
)

BasicSSHUserPrivateKey instanceCredentials = new BasicSSHUserPrivateKey(
  CredentialsScope.GLOBAL,
  "instance-ssh-key",
  "ec2-user",
  new DirectEntryPrivateKeySource(config.ec2PrivateKey),
  "",
  "my private key to ssh ec2 for jenkins"
)
// find detailed information about parameters on plugin config page or
// https://github.com/jenkinsci/ec2-fleet-plugin/blob/master/src/main/java/com/amazon/jenkins/ec2fleet/EC2FleetCloud.java
EC2FleetCloud ec2FleetCloud = new EC2FleetCloud(
  "", // fleetCloudName
  null,
  awsCredentials.id,
  config.region,
  "",
  config.fleetId,
  "ec2-fleet",  // labels
  "", // fs root
  new SSHConnector(22,
                   instanceCredentials.id, "", "", "", "", null, 0, 0,
                   // consult doc for line below, this one say no host verification, but you can use more strict mode
                   // https://github.com/jenkinsci/ssh-slaves-plugin/blob/master/src/main/java/hudson/plugins/sshslaves/verifiers/NonVerifyingKeyVerificationStrategy.java
                   new NonVerifyingKeyVerificationStrategy()),
  false, // if need to use privateIpUsed
  false, // if need alwaysReconnect
  config.idleMinutes, // if need to allow downscale set > 0 in min
  config.minSize, // minSize
  config.maxSize, // maxSize
  0,
  config.numExecutors, // numExecutors
  false, // addNodeOnlyIfRunning
  false, // restrictUsage allow execute only jobs with proper label
  "",
  false,
  180,
  null,
  30,
  true,
  new EC2FleetCloud.NoScaler()
)

// get Jenkins instance
Jenkins jenkins = Jenkins.get()
// get credentials domain
def domain = Domain.global()
// get credentials store
def store = jenkins.getExtensionList('com.cloudbees.plugins.credentials.SystemCredentialsProvider')[0].getStore()
// add credential to store
store.addCredentials(domain, awsCredentials)
store.addCredentials(domain, instanceCredentials)
// add cloud configuration to Jenkins
jenkins.clouds.add(ec2FleetCloud)
// save current Jenkins state to disk
jenkins.save()

Preconfigure Agent

Sometimes you need to prepare an agent (an EC2 instance) before Jenkins can use it. For example, you need to install some software which is required by your builds like Maven, etc.

For those cases you have a few options, described below:

Amazon EC2 AMI

AMI allows you to create custom images for your EC2 instances. For example, you can create an image with Linux plus Java, Maven etc. Then, when EC2 Fleet launches new EC2 instances with this AMI they will automatically get all the required software. Nice =)

  1. Create a custom AMI as described here
  2. Create EC2 Fleet or Spot Fleet with this AMI

EC2 Instance User Data

EC2 instances allow you to specify a User Data script that is executed when an instance first launches. This allows you to customize the setup for a particular instance.

SSH Prefix Verification

EC2 instances don't provide any information about the User Data script execution status, so Jenkins could start a task on a new instance while the script is still in progress. Most of the time Jenkins will repeatedly try to connect to the instance during this time and print out errors until the script completes and Jenkins can connect.

To avoid those errors, you can use the Jenkins SSH Launcher Prefix Start Agent Command setting to specify a command which should fail if User Data is not finished. In that way Jenkins will not be able to connect to the instance until the User Data script is done. More information on configuring the SSH launcher can be found here.

  1. Open Jenkins
  2. Go to Manage Jenkins > Configure System
  3. Find proper fleet configuration and click Advanced... for SSH Launcher
  4. Add checking command into field Prefix Start Agent Command
    • example java -version &&
  5. To apply for existing instances, restart Jenkins or Delete Nodes from Jenkins so they will be reconnected

FAQ

Check out the FAQ & Gotchas page here.

Development

Plugin usage statistics per Jenkins version can be found here

Releasing

https://jenkins.io/doc/developer/publishing/releasing/

mvn release:prepare release:perform

Jenkins 2 can't connect by SSH

https://issues.jenkins-ci.org/browse/JENKINS-53954

Install Java 8 on EC2 instance

Regular script:

sudo yum install java-1.8.0 -y
sudo yum remove java-1.7.0-openjdk -y
java -version 

User Data Script:

Note sudo is not required, -y suppresses confirmation. Don't forget to encode with Base64

#!/bin/bash
yum install java-1.8.0 -y && yum remove java-1.7.0-openjdk -y && java -version

Contributing

Contributions are welcome! Please read our guidelines and our Code of Conduct.

ec2-fleet-plugin's People

Contributors

basil avatar brycahta avatar cyberax avatar driverpt avatar eagletmt avatar elatt avatar firebike avatar gavinburris42 avatar h-okon avatar haugenj avatar ianfixes avatar imuqtadir avatar jimcooley avatar jzila avatar marklagendijk avatar naraharip2017 avatar ndeloof avatar npeters avatar odavid avatar oleg-nenashev avatar pdk27 avatar schmutze avatar sopel39 avatar srodriguezo avatar terma avatar thepwagner avatar tim-goto avatar uriinf avatar vineeth-bandi avatar vroy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ec2-fleet-plugin's Issues

Support auto-scaling fleets as well

We're finding that AWS terminates our spot instances too often, which just kills our Jenkins builds.

It would be great if we could specify an auto-scaling fleet either as a backup for spot instances, or instead of the spot fleet.

For the time being I think we'll need to switch to the EC2 plugin, which isn't nearly as robust as this one.

Make getRecurrencePeriod() return Configurable

When running a lot (20 for us) of unique Jenkins clusters with unique Spot Fleets we start running into RequestLimitExceeded error responses from Amazon. Configuring the timer to something less aggressive will help us work around this. The current constant of 10000ms is now hardcoded.

nodes intermittently mapped incorrectly

During periods of rapid creation and termination of instances by the spot request, Jenkins is mapping the wrong IP Addresses to the spotfleet nodes.

I have a pipeline with the following code snippet:

node(nodeTag) {
   sh 'echo "$NODE_NAME:$(curl -s http://169.254.169.254/latest/meta-data/instance-id):$(hostname)"'
}

Here's some live output from that code:

00:04:19.142 [container validation] + echo i-09a15abb0da325e75:i-061d5ad6baabb2a0f:ip-10-21-129-19

As you can see, Jenkins thinks the node is i-09a15abb0da325e75, but the node itself reports that it is i-061d5ad6baabb2a0f

Running Jenkins ver. 2.138
Running EC2 Fleet Jenkins Plugin ver. 1.1.7

Stopping and then starting Jenkins seems to correct the incorrect mapping.

Support manually deleting nodes

In Jenkins you can manually delete nodes by selecting a node and choosing 'Delete agent'.
However, when you do this to any node created by this plugin, it will only delete the Jenkins node, not the actual Amazon node.
Because of this the Jenkins node gets recreated directly after the deletion.

Would it be possible to also delete the Amazon nodes when the Jenkins node is deleted manually?
We sometimes need to manually delete nodes. It would be handy if this could be done directly from Jenkins.

AWS terminates instances under load

When I have plenty of idle instances (see issue #35), plugin starts to actively down-scale instances and AWS accidentally terminate some instances with running jobs.
In result:

  1. I have broken jobs with the following error:
    FATAL: java.io.IOException: Unexpected termination of the channel
    java.io.EOFException
    	at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2678)
    ...
    Caused: hudson.remoting.RequestAbortedException
    ...
    Agent went offline during the build
    
  2. I have offline instances (common issue, see also #11)

The situation happens only when I have a lot of idle instances (more than 10-20).

how does ec2-fleet resolve the known_hosts issue?

Thanks to both awslabs and jenkins for a great plugin.

If I add an agent using ssh-slave (which ec2-fleet uses), I need to ensure that:

  1. The master's ssh public key is in $JENKINS_HOME/.ssh/authorized_keys on the agent (or passed via param or env var for the docker image)
  2. The agent's host key is in $JENKINS_HOME/.ssh/known_hosts on the master

The agent here is launched dynamically via ec2-fleet, so the host key for the new agent isn't known until after the host launches, when Jenkins tries to connect.

How do you insert the (new) host key into known_hosts, and remove it when a node is deleted (important since IPs can be reused)? Or do most people take the insecure route and disable known hosts verification?

On the other side, how is the public key for the Jenkins master installed onto the spot instance? Is the assumption that I will set up a spot fleet, and configure it via userdata?

Add to readme example of fleet configuration with on-demand

There are multiple use cases for plugin, one of them when plugin users don't want to be totally depends on Spot market and want to have some basic/minimum capacity to gurantee some throughput. So we want to provide configuration example for that.

Created by discussion in #98

Windows nodes not removed after termination

I started a spot fleet in AWS. The bidding went up to $15/hr for several hours. My max bid price was $1.50. All of the instances for my spot instance were terminated by AWS. In Jenkins, I had about 2 dozen dead instances that Jenkins failed to remove. I was able to remove the instances manually one at a time in Jenkins. Deleting the fleet configuration in Jenkins and restarting also removes the instances. The bug is that this plugin does not gracefully handle the case where AWS cancels all of the instances for the spot fleet and is unable to provision additional instances. The instances are marked as down in Jenkins and are not removed. This is currently happening for c3.2xlarge Windows which has been completely sold out for several hours in AWS today. It was not even possible to start an on-demand instance.

plugin starts two times more instances than needed

Hi

I have zero as Minimum Cluster Size.
I have eight different EC2 Fleet clouds with one executer per node.
When I start a job which requires only 96 nodes, I get ~142 nodes up.
The situation happens only when I have 0 running instances before job running.

EC2FleetCloud.updateStatus can cause big delays at jenkins queue

We have bottleneck at jenkins queue because EC2FleetCloud.updateStatus holds queue lock for a long time. See stacktrace below.
Out fleet has ~100 instances
I see a lot of messages like this

Mar 11, 2019 5:08:08 AM com.amazon.jenkins.ec2fleet.EC2FleetCloud terminateInstance
INFO: Attempting to terminate instance: i-079f2eabb225f6cc3
Mar 11, 2019 5:08:15 AM com.amazon.jenkins.ec2fleet.EC2FleetCloud terminateInstance
INFO: Not terminating i-079f2eabb225f6cc3 because we need a minimum of 100 instances running.

During terminateInstance call it triggers updateStatus call which takes 5-10 sec.
I don't know how often IdleRetentionStrategy called, but sometimes we had job not starting for 20 min with 40 executors available.

Here are similar bugs in other plugins
https://issues.jenkins-ci.org/browse/JENKINS-54988
https://issues.jenkins-ci.org/browse/JENKINS-38815

What I understood from comments to these bugs is that jenkins can trigger this call very often and it should be pretty fast.
Perhaps the solution from https://issues.jenkins-ci.org/browse/JENKINS-38815 can be applied here as well. I mean execute describeInstances by timer and cache result, so isTerminated can just use cached result.

"jenkins.util.Timer [#6]" Id=44 Group=main RUNNABLE
	at java.net.SocketInputStream.socketRead0(Native Method)
	at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
	at java.net.SocketInputStream.read(SocketInputStream.java:171)
	at java.net.SocketInputStream.read(SocketInputStream.java:141)
	at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
	at sun.security.ssl.InputRecord.read(InputRecord.java:503)
	at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:975)
	-  locked java.lang.Object@18d580a8
	at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:933)
	at sun.security.ssl.AppInputStream.read(AppInputStream.java:105)
	-  locked sun.security.ssl.AppInputStream@5dd71dd6
	at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
	at org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153)
	at org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:282)
	at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138)
	at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
	at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
	at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
	at org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:165)
	at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
	at com.amazonaws.http.protocol.SdkHttpRequestExecutor.doReceiveResponse(SdkHttpRequestExecutor.java:82)
	at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
	at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
	at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
	at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
	at com.amazonaws.http.apache.client.impl.SdkHttpClient.execute(SdkHttpClient.java:72)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1285)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1101)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:758)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:732)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:714)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:674)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:656)
	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:520)
	at com.amazonaws.services.ec2.AmazonEC2Client.doInvoke(AmazonEC2Client.java:19296)
	at com.amazonaws.services.ec2.AmazonEC2Client.invoke(AmazonEC2Client.java:19263)
	at com.amazonaws.services.ec2.AmazonEC2Client.invoke(AmazonEC2Client.java:19252)
	at com.amazonaws.services.ec2.AmazonEC2Client.executeDescribeInstances(AmazonEC2Client.java:9457)
	at com.amazonaws.services.ec2.AmazonEC2Client.describeInstances(AmazonEC2Client.java:9429)
	at com.amazon.jenkins.ec2fleet.EC2FleetCloud.isTerminated(EC2FleetCloud.java:396)
	at com.amazon.jenkins.ec2fleet.EC2FleetCloud.updateStatus(EC2FleetCloud.java:328)
	-  locked com.amazon.jenkins.ec2fleet.EC2FleetCloud@67d680a7
	at com.amazon.jenkins.ec2fleet.EC2FleetCloud.terminateInstance(EC2FleetCloud.java:460)
	-  locked com.amazon.jenkins.ec2fleet.EC2FleetCloud@67d680a7
	at com.amazon.jenkins.ec2fleet.IdleRetentionStrategy.check(IdleRetentionStrategy.java:59)
	-  locked com.amazon.jenkins.ec2fleet.EC2FleetCloud@67d680a7
	at com.amazon.jenkins.ec2fleet.IdleRetentionStrategy.check(IdleRetentionStrategy.java:15)
	at hudson.slaves.ComputerRetentionWork$1.run(ComputerRetentionWork.java:72)
	at hudson.model.Queue._withLock(Queue.java:1381)
	at hudson.model.Queue.withLock(Queue.java:1258)
	at hudson.slaves.ComputerRetentionWork.doRun(ComputerRetentionWork.java:63)
	at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:72)
	at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

	Number of locked synchronizers = 2
	- java.util.concurrent.locks.ReentrantLock$NonfairSync@4b401c2a
	- java.util.concurrent.ThreadPoolExecutor$Worker@5150cb76

DeadLock

This is a blocking issue

jenkins.util.Timer [#7]

"jenkins.util.Timer [#7]" Id=47 Group=main WAITING on java.util.concurrent.locks.ReentrantLock$NonfairSync@54eab1a8 owned by "jenkins.util.Timer [#9]" Id=49
	at sun.misc.Unsafe.park(Native Method)
	-  waiting on java.util.concurrent.locks.ReentrantLock$NonfairSync@54eab1a8
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
	at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
	at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
	at hudson.model.Queue._withLock(Queue.java:1336)
	at hudson.model.Queue.withLock(Queue.java:1215)
	at jenkins.model.Nodes.addNode(Nodes.java:133)
	at jenkins.model.Jenkins.addNode(Jenkins.java:2116)
	at com.amazon.jenkins.ec2fleet.EC2FleetCloud.addNewSlave(EC2FleetCloud.java:361)
	-  locked hudson.model.Hudson@6a65d440
	at com.amazon.jenkins.ec2fleet.EC2FleetCloud.updateStatus(EC2FleetCloud.java:318)
	-  locked com.amazon.jenkins.ec2fleet.EC2FleetCloud@3a37742f
	at com.amazon.jenkins.ec2fleet.CloudNanny.doRun(CloudNanny.java:42)
	at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:51)
	at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

	Number of locked synchronizers = 1
	- java.util.concurrent.ThreadPoolExecutor$Worker@159d21f3


jenkins.util.Timer [#9]

"jenkins.util.Timer [#9]" Id=49 Group=main BLOCKED on com.amazon.jenkins.ec2fleet.EC2FleetCloud@3a37742f owned by "jenkins.util.Timer [#7]" Id=47
	at com.amazon.jenkins.ec2fleet.IdleRetentionStrategy.check(IdleRetentionStrategy.java:38)
	-  blocked on com.amazon.jenkins.ec2fleet.EC2FleetCloud@3a37742f
	at com.amazon.jenkins.ec2fleet.IdleRetentionStrategy.check(IdleRetentionStrategy.java:15)
	at hudson.slaves.ComputerRetentionWork$1.run(ComputerRetentionWork.java:72)
	at hudson.model.Queue._withLock(Queue.java:1338)
	at hudson.model.Queue.withLock(Queue.java:1215)
	at hudson.slaves.ComputerRetentionWork.doRun(ComputerRetentionWork.java:63)
	at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:51)
	at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

	Number of locked synchronizers = 2
	- java.util.concurrent.locks.ReentrantLock$NonfairSync@54eab1a8
	- java.util.concurrent.ThreadPoolExecutor$Worker@7e68c860

Scaling down to zero instances works, but then won't scale back up

When settings the min number of instances to zero, the plugin will successfully scale the instances down after the defined idle time. It doesn't however then scale up the desire size once job start coming in.

I've tested this with a build queue size of 1, 5, 10 and executors (per node) of 1 and 4

edit: Just to give some context, this only occurs when I have other nodes that I explicitly mark as offline. As soon as I delete the offline nodes, scaling works as expected.

New nodes get I/O error and disconnect at ssh timeout

When the ec2-fleet-plugin adds new spotfleet instances as Jenkins nodes using the Launcher selection of "Launch agent agents via SSH", the nodes all connect just fine, but some percentage of them disconnect with a SEVERE I/O error shortly after launch. The I/O error happens at exactly the configured "Connection Timeout in Seconds" from launch. When reconnected after that, they have no problems.

I have not confirmed, but I think this only happens when "Max Idle Minutes Before Scaledown" is set (attaching the IdleRetentionStrategy to the node).

Here's the sequence from the logs with the default ssh connection timeout of 210 seconds:

15:01:11.832 - INFO: Found new instances from fleet (ec2-fleet test): [<snip>, i-08db665d464785aec, <snip>]
15:01:21.966 - INFO: Idle Retention initiated
15:01:21.967 - INFO: Attempting to reconnect i-08db665d464785aec
15:01:56.067 - SSH Launch of i-08db665d464785aec on 10.21.131.211 completed in 34,083 ms
15:04:51.990 - SEVERE: I/O error in channel i-08db665d464785aec

Use instance profile for credentials

When running Jenkins instance on an EC2 instance with an instance profile attached I should have the option of not providing hard coded AWS credentials. Is this a possibility?

Set Hostname and Tags when scaling up

When our agents scale up based on need, they pull from our AMI that was created from an instance. When the agents scale up they are creating their hostname from the original instance name that was used to create the AMI. Is there a way on the fly as it is scaling up to change the hostname? And possibly add tags to the instance?

NPE in EC2FleetCloud.java:319 breaking everything

Timer task com.amazon.jenkins.ec2fleet.CloudNanny@a7a8e60 failed
java.lang.NullPointerException
	at com.amazon.jenkins.ec2fleet.EC2FleetCloud.updateStatus(EC2FleetCloud.java:319)
	at com.amazon.jenkins.ec2fleet.CloudNanny$1.call(CloudNanny.java:46)
	at com.amazon.jenkins.ec2fleet.CloudNanny$1.call(CloudNanny.java:43)
	at hudson.model.Queue._withLock(Queue.java:1438)
	at hudson.model.Queue.withLock(Queue.java:1299)
	at com.amazon.jenkins.ec2fleet.CloudNanny.doRun(CloudNanny.java:43)
	at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:72)
	at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
May 29, 2019 5:34:05 PM INFO com.amazon.jenkins.ec2fleet.EC2FleetCloud provisionInternal
Start provision label = linux, excessWorkload = 4
May 29, 2019 5:34:05 PM WARNING com.amazon.jenkins.ec2fleet.EC2FleetCloud provision
provisionInternal failed
java.lang.NullPointerException
	at com.amazon.jenkins.ec2fleet.EC2FleetCloud.updateStatus(EC2FleetCloud.java:319)
	at com.amazon.jenkins.ec2fleet.EC2FleetCloud.provisionInternal(EC2FleetCloud.java:260)
	at com.amazon.jenkins.ec2fleet.EC2FleetCloud$1.call(EC2FleetCloud.java:247)
	at com.amazon.jenkins.ec2fleet.EC2FleetCloud$1.call(EC2FleetCloud.java:244)
	at hudson.model.Queue._withLock(Queue.java:1438)
	at hudson.model.Queue.withLock(Queue.java:1299)
	at com.amazon.jenkins.ec2fleet.EC2FleetCloud.provision(EC2FleetCloud.java:244)
	at hudson.slaves.NodeProvisioner$StandardStrategyImpl.apply(NodeProvisioner.java:714)
	at hudson.slaves.NodeProvisioner.update(NodeProvisioner.java:320)
	at hudson.slaves.NodeProvisioner.access$000(NodeProvisioner.java:62)
	at hudson.slaves.NodeProvisioner$NodeProvisionerInvoker.doRun(NodeProvisioner.java:808)
	at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:72)
	at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

Add planned nodes as PlannedNode, to prevent overprovisioning.

Jenkins core has nice logic for provisioning new nodes. When it is calculating the excessWorkload it considers the PlannedNodes, so it will not overprovision.

@terma commented on #53:

I've checked Jenkins logic around provision, looks like Jenkins correctly consider planned nodes for excessWorkload calculation code here, additionaly that confirmed by article

My theory that we have overprovision because of incorrect adding of new nodes to Jenkins. Here code how we do that now, so we add nodes directly to jenkins while it should be by PlannedNode more about that

I believe that a correct implementation would work as follows:

  • When new nodes are created via the AWS api, they are also directly added as PlannedNode to Jenkins.
  • Only when new nodes can be connected to, they are transformed to PlannedNode to normal node.

AWS termination of node -- kick off new builds for affected jobs

When AWS kills a node because of some external event (such as it wants that instance type back), then jobs running on those nodes will wait until timeout, or forever, while the node is marked offline.

Instead it would be nice that if a node disappears, for any reason, and it has currently active jobs on it, that 1. those jobs are killed and 2. that those jobs are "kicked".

In my case it's a bunch of Jenkins pipeline jobs, and it would be great to kick them so that another executor can hopefully pick up the job and complete it before the next node is kicked over by Amazon.

Plugin will not remove new nodes which not execute yet

In IdleRetentionStrategy statement for new node which not yet execute any task getIdleStartMilliseconds will return Long.MIN_VALUE which will be considered as not idle time. Potentially could keep instances which are not required.

RequestLimitExceeded: EC2 fleet plugin making too many requests to DescribeSpotFleetRequests and DescribeInstances

We have a Jenkins environment with five EC2 fleets configured. We've been running into RequestLimitExceeded issues in our AWS account. After opening a support ticket with AWS, we've discovered that our Jenkins master server instance is making over 270,000 API requests per day. Over 17,000 of which are hitting the RequestLimitExceeded. The only connection to AWS that we have configured on our master is the AWS EC2 fleet plugin.

Looking in cloudtrail, we see our Jenkins spot fleet user making an average over 2,000 requests per hour to DescribeSpotFleetRequests and DescribeInstances over 2,200 times per hour.

SSLHandshakeException while adding the AWS Credentials

Hi All,

I am getting the below exceptions when i tried to add the AWS credentials in jenkin's credentials store
Can some one please suggest what certs it is looking for ?

Jan 07, 2019 11:21:31 AM org.eclipse.jetty.server.handler.ContextHandler$Context log
WARNING: Error while serving https://myjenkins.com/descriptorByName/com.cloudbees.jenkins.plugins.awscredentials.AWSCredentialsImpl/checkSecretKey
java.lang.reflect.InvocationTargetException
at org.kohsuke.stapler.Function$MethodFunction.invoke(Function.java:347)
at org.kohsuke.stapler.Function.bindAndInvoke(Function.java:184)
at org.kohsuke.stapler.Function.bindAndInvokeAndServeResponse(Function.java:117)
at org.kohsuke.stapler.MetaClass$1.doDispatch(MetaClass.java:129)
at org.kohsuke.stapler.NameBasedDispatcher.dispatch(NameBasedDispatcher.java:58)
at org.kohsuke.stapler.Stapler.tryInvoke(Stapler.java:715)
at org.kohsuke.stapler.Stapler.invoke(Stapler.java:845)
at org.kohsuke.stapler.MetaClass$5.doDispatch(MetaClass.java:248)
at org.kohsuke.stapler.NameBasedDispatcher.dispatch(NameBasedDispatcher.java:58)
at org.kohsuke.stapler.Stapler.tryInvoke(Stapler.java:715)
at org.kohsuke.stapler.Stapler.invoke(Stapler.java:845)
at org.kohsuke.stapler.Stapler.invoke(Stapler.java:649)
at org.kohsuke.stapler.Stapler.service(Stapler.java:238)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:841)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1650)
at hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:154)
at org.jenkinsci.plugins.ssegateway.Endpoint$SSEListenChannelFilter.doFilter(Endpoint.java:225)
at hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:151)
at io.jenkins.blueocean.auth.jwt.impl.JwtAuthenticationFilter.doFilter(JwtAuthenticationFilter.java:61)
at hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:151)
at com.cloudbees.jenkins.support.slowrequest.SlowRequestFilter.doFilter(SlowRequestFilter.java:37)
at hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:151)
at io.jenkins.blueocean.ResourceCacheControl.doFilter(ResourceCacheControl.java:134)
at hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:151)
at hudson.plugins.greenballs.GreenBallFilter.doFilter(GreenBallFilter.java:59)
at hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:151)
at jenkins.metrics.impl.MetricsFilter.doFilter(MetricsFilter.java:125)
at hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:151)
at hudson.util.PluginServletFilter.doFilter(PluginServletFilter.java:157)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1637)
at hudson.security.csrf.CrumbFilter.doFilter(CrumbFilter.java:99)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1637)
at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:84)
at hudson.security.UnwrapSecurityExceptionFilter.doFilter(UnwrapSecurityExceptionFilter.java:51)
at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87)
at jenkins.security.ExceptionTranslationFilter.doFilter(ExceptionTranslationFilter.java:117)
at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87)
at org.acegisecurity.providers.anonymous.AnonymousProcessingFilter.doFilter(AnonymousProcessingFilter.java:125)
at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87)
at org.acegisecurity.ui.rememberme.RememberMeProcessingFilter.doFilter(RememberMeProcessingFilter.java:142)
at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87)
at org.acegisecurity.ui.AbstractProcessingFilter.doFilter(AbstractProcessingFilter.java:271)
at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87)
at jenkins.security.BasicHeaderProcessor.doFilter(BasicHeaderProcessor.java:93)
at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87)
at org.acegisecurity.context.HttpSessionContextIntegrationFilter.doFilter(HttpSessionContextIntegrationFilter.java:249)
at hudson.security.HttpSessionContextIntegrationFilter2.doFilter(HttpSessionContextIntegrationFilter2.java:67)
at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87)
at hudson.security.ChainedServletFilter.doFilter(ChainedServletFilter.java:90)
at hudson.security.HudsonFilter.doFilter(HudsonFilter.java:171)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1637)
at org.kohsuke.stapler.compression.CompressionFilter.doFilter(CompressionFilter.java:49)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1637)
at hudson.util.CharacterEncodingFilter.doFilter(CharacterEncodingFilter.java:82)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1637)
at org.kohsuke.stapler.DiagnosticThreadNameFilter.doFilter(DiagnosticThreadNameFilter.java:30)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1637)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:190)
at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)
at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:188)
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1253)
at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:168)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)
at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564)
at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:166)
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1155)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at org.eclipse.jetty.server.Server.handle(Server.java:564)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:317)
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:279)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:110)
at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:124)
at org.eclipse.jetty.util.thread.Invocable.invokePreferred(Invocable.java:128)
at org.eclipse.jetty.util.thread.Invocable$InvocableExecutor.invoke(Invocable.java:222)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:294)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:199)
at winstone.BoundedExecutorService$1.run(BoundedExecutorService.java:77)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Caused by: com.amazonaws.SdkClientException: Unable to execute HTTP request: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleRetryableException(AmazonHttpClient.java:1163)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1109)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:758)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:732)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:714)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:674)
at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:656)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:520)
at com.amazonaws.services.securitytoken.AWSSecurityTokenServiceClient.doInvoke(AWSSecurityTokenServiceClient.java:1368)
at com.amazonaws.services.securitytoken.AWSSecurityTokenServiceClient.invoke(AWSSecurityTokenServiceClient.java:1335)
at com.amazonaws.services.securitytoken.AWSSecurityTokenServiceClient.invoke(AWSSecurityTokenServiceClient.java:1324)
at com.amazonaws.services.securitytoken.AWSSecurityTokenServiceClient.executeAssumeRole(AWSSecurityTokenServiceClient.java:491)
at com.amazonaws.services.securitytoken.AWSSecurityTokenServiceClient.assumeRole(AWSSecurityTokenServiceClient.java:464)
at com.cloudbees.jenkins.plugins.awscredentials.AWSCredentialsImpl$DescriptorImpl.doCheckSecretKey(AWSCredentialsImpl.java:220)
at java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:627)
at org.kohsuke.stapler.Function$MethodFunction.invoke(Function.java:343)
... 86 more
Caused by: javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
at sun.security.ssl.Alerts.getSSLException(Alerts.java:192)
at sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1949)
at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:302)
at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:296)
at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1514)
at sun.security.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:216)
at sun.security.ssl.Handshaker.processLoop(Handshaker.java:1026)
at sun.security.ssl.Handshaker.process_record(Handshaker.java:961)
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1062)
at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1375)
at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1403)
at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1387)
at org.apache.http.conn.ssl.SSLConnectionSocketFactory.createLayeredSocket(SSLConnectionSocketFactory.java:396)
at org.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket(SSLConnectionSocketFactory.java:355)
at com.amazonaws.http.conn.ssl.SdkTLSSocketFactory.connectSocket(SdkTLSSocketFactory.java:142)
at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:142)
at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:373)
at sun.reflect.GeneratedMethodAccessor407.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.amazonaws.http.conn.ClientConnectionManagerFactory$Handler.invoke(ClientConnectionManagerFactory.java:76)
at com.amazonaws.http.conn.$Proxy77.connect(Unknown Source)
at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:381)
at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:237)
at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
at com.amazonaws.http.apache.client.impl.SdkHttpClient.execute(SdkHttpClient.java:72)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1285)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1101)
... 100 more
Caused by: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:387)
at sun.security.validator.PKIXValidator.engineValidate(PKIXValidator.java:292)
at sun.security.validator.Validator.validate(Validator.java:260)
at sun.security.ssl.X509TrustManagerImpl.validate(X509TrustManagerImpl.java:324)
at sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:229)
at sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:124)
at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1496)
... 126 more
Caused by: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
at sun.security.provider.certpath.SunCertPathBuilder.build(SunCertPathBuilder.java:141)
at sun.security.provider.certpath.SunCertPathBuilder.engineBuild(SunCertPathBuilder.java:126)
at java.security.cert.CertPathBuilder.build(CertPathBuilder.java:280)
at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:382)
... 132 more

image

Use an external SSH client process to connect to the slaves

Have we ever considered this idea? I've found that the slaves getting terminated more often since I started using this plugin due to the same error pattern:

Apr 08, 2019 6:55:13 AM hudson.remoting.SynchronousCommandTransport$ReaderThread run
INFO: I/O error in channel i-0544c80944d28d2a7
java.io.IOException: Unexpected termination of the channel
        at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:77)
Caused by: java.io.EOFException
        at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2681)
        at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3156)
        at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:862)
        at java.io.ObjectInputStream.<init>(ObjectInputStream.java:358)
        at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:49)
        at hudson.remoting.Command.readFrom(Command.java:140)
        at hudson.remoting.Command.readFrom(Command.java:126)
        at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:36)
        at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:63)

I understand this is an issue of https://github.com/jenkinsci/remoting but it didn't happen to me when I used the https://github.com/jenkinsci/ec2-plugin with that feature turned on.

ConcurrentModificationException in EC2FleetCloud.updateStatus

May 29, 2019 7:00:30 PM INFO com.amazon.jenkins.ec2fleet.EC2FleetCloud updateStatus
Fleet (linux) no longer has the instance i-0fe511f1b6c241ea3, removing from Jenkins.
May 29, 2019 7:00:30 PM SEVERE hudson.triggers.SafeTimerTask run
Timer task com.amazon.jenkins.ec2fleet.CloudNanny@713204b8 failed
java.util.ConcurrentModificationException
	at java.util.HashMap$HashIterator.nextNode(HashMap.java:1445)
	at java.util.HashMap$KeyIterator.next(HashMap.java:1469)
	at com.amazon.jenkins.ec2fleet.EC2FleetCloud.updateStatus(EC2FleetCloud.java:379)
	at com.amazon.jenkins.ec2fleet.CloudNanny$1.call(CloudNanny.java:48)
	at com.amazon.jenkins.ec2fleet.CloudNanny$1.call(CloudNanny.java:44)
	at hudson.model.Queue._withLock(Queue.java:1438)
	at hudson.model.Queue.withLock(Queue.java:1299)
	at com.amazon.jenkins.ec2fleet.CloudNanny.doRun(CloudNanny.java:44)
	at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:72)
	at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

Plugin doesn't scale nodes

Hi,

I have noticed, that scaling isn't working correctly.

Case 1: There are few builds in queue
Plugin: Doesn't create new nodes and doesn't send to AWS request to increase fleet capacity (even thought spawning them wouldn't exceed max-cluster-size);

Case 2: Min cluster size is greater than 1
Plugin: Again, it will only create 1 node and will not spawn more.

Case 3: AWS spot fleet size is 2 and min cluster size is 2
Plugin: After some time Plugin will set AWS spot fleet capacity to 1, but will keep min cluster size at 2

The weirdest part is, those are some recent bugs - I have used this plugin for some time and I haven't notice such an issues earlier.

If someone knows, what is going on - answer please (or just bugifx it, if it really is a problem with plugin).

@VackarAfzal . In my case Target capacity and min size works weirdly or please correct me if my understanding is wrong . When I keep the Target capacity in jenkins as 2 and min size as 0 . It immediately sets the target capacity to two in the spot fleet when there is a job which does not require that much . is it supposed to create just one and scale up based on demand ???

@VackarAfzal . In my case Target capacity and min size works weirdly or please correct me if my understanding is wrong . When I keep the Target capacity in jenkins as 2 and min size as 0 . It immediately sets the target capacity to two in the spot fleet when there is a job which does not require that much . is it supposed to create just one and scale up based on demand ???

Originally posted by @ebalakumar in #52 (comment)

Test Connection button on configure screen does not test with newly supplied credentials

Steps to re-produce:

  • Fresh deployment of Jenkins with the ec2-fleet plugin installed;
  • Instance profile for the Jenkins instance has no ec2 permissions configured, IAM User has ec2:* permissions defined in IAM as well as a secret key/access key id;
  • During configuration of plugin at Manage Jenkins > Configure Jenkins, set the "AWS Crendentials" (sic: there's a typo here that also needs to be fixed) as the access key id & secret key created above. Fill out the rest of the form then hit "Test Connection". This fails with "You are not authorized to perform this operation. (Service: AmazonEC2; Status Code: 403; Error Code: UnauthorizedOperation; Request ID: xxxxxxxxxxxx)"
  • Add the ec2:describeSpotFleetInstances permission to the Instance profile role. Repeat "Test Connection" and "Success" is displayed.

How do you set per-node environment variables?

We're trying to set some Jenkins environment variables for use in our declarative pipeline for the spot fleet. We were trying to use the SSH agent command prefix to do so, and finding that it seems to randomly work sometimes and not work other times. In our case we just need to set some environment variables as being the same across all of the nodes in the fleet (but we have other hardware agents that have varying values for these environment variables). Is there a better way to do this? /etc/profile also doesn't seem to work reliably. Thanks!

Cluster minSize ideally should accept also 0

See:

<f:number clazz="required positive-number" default="1" />

Cluster minSize ideally should accept also 0. It should allow to scale the cluster down to 0. Currently minSize is 1, meaning that even if no tasks running, the one node is still hanging around and burning money. This should be particularly important for periodical tasks when cluster is idle between them.

EDIT:
perhaps it should be sufficent to change the line 49 to:
<f:number clazz="required number" default="1" /> ?

Terminate nodes if not online within X minutes

I'm using cloud-init to bring my nodes online, however every so often when they reboot after doing the package installation/upgrades the system won't come back online due to issues with NVMe or some other error.

In those cases instead of considering them up because they are up according to Amazon it would be neat if the instance is terminated with prejudice so that Amazon can kick over a new instance to replace it.

This doesn't happen often, but not needing to manually do this would be fantastic!

My nodes should come online within 5 - 10 minutes, if they still aren't doing work by then, shoot em in the head.

Allow to set node names to custom pattern

It would be nice if there was an option to set the name of new nodes coming up.

Allowing to set a naming scheme with placeholders (e.g. myname-${number}) would allow for some more intuitive names than the instance id. Other variables could be the date, vpc, instance-type, tags, ips, dns-entries, etc.

In my particular case instances are setting their name during launch automatically and so being able to read the "NAME" tag from an instance would be a nice feature to have.

Plugin calculates workload incorrectly by including all Jenkins labels

Expectation:

  • plugin calculates workload using only the labels that are assigned to the EC2 fleet configuration

Reality:

  • plugin calculates workload using all available executors including nodes without EC2 Fleet labels

Example:
We noticed this because we have one local machine with 15 executors that are not always in use. Our build queue was building up to >10 builds and weren't triggering any new EC2 nodes. We waited nearly 30 mins - 1 hour for a new node until we decided to test marking the local machine temporarily offline. Moments later, new nodes were scheduled to handle the build queue.

UnsupportedClassVersionError: Unsupported major.minor version 52.0

I try to start a spot fleet instance but get an error on ssh connection.

Jenkins v2.150.1

$ java -version
openjdk version "1.8.0_191"
OpenJDK Runtime Environment (build 1.8.0_191-8u191-b12-0ubuntu0.18.04.1-b12)
OpenJDK 64-Bit Server VM (build 25.191-b12, mixed mode)

Agent Log:

[01/11/19 10:20:16] [SSH] Opening SSH connection to x.x.x.x:22.
[01/11/19 10:20:16] [SSH] The SSH key with fingerprint x:x:x:x:x:x:x:x has been automatically trusted for connections to this machine.
[01/11/19 10:20:16] [SSH] Authentication successful.
[01/11/19 10:20:16] [SSH] The remote user's environment is:
AWS_AUTO_SCALING_HOME=/opt/aws/apitools/as
AWS_CLOUDWATCH_HOME=/opt/aws/apitools/mon
AWS_ELB_HOME=/opt/aws/apitools/elb
AWS_PATH=/opt/aws
BASH=/bin/bash
BASHOPTS=cmdhist:extquote:force_fignore:hostcomplete:interactive_comments:progcomp:promptvars:sourcepath
BASH_ALIASES=()
BASH_ARGC=()
BASH_ARGV=()
BASH_CMDS=()
BASH_EXECUTION_STRING=set
BASH_LINENO=()
BASH_SOURCE=()
BASH_VERSINFO=([0]="4" [1]="2" [2]="46" [3]="2" [4]="release" [5]="x86_64-redhat-linux-gnu")
BASH_VERSION='4.2.46(2)-release'
DIRSTACK=()
EC2_AMITOOL_HOME=/opt/aws/amitools/ec2
EC2_HOME=/opt/aws/apitools/ec2
EUID=500
GROUPS=()
HOME=/home/ec2-user
HOSTNAME=ip-x-x-x-x
HOSTTYPE=x86_64
ID=500
IFS=$' \t\n'
JAVA_HOME=/usr/lib/jvm/jre
LANG=en_US.UTF-8
LESSOPEN='||/usr/bin/lesspipe.sh %s'
LESS_TERMCAP_mb=$'\E[01;31m'
LESS_TERMCAP_md=$'\E[01;38;5;208m'
LESS_TERMCAP_me=$'\E[0m'
LESS_TERMCAP_se=$'\E[0m'
LESS_TERMCAP_ue=$'\E[0m'
LESS_TERMCAP_us=$'\E[04;38;5;111m'
LOGNAME=ec2-user
MACHTYPE=x86_64-redhat-linux-gnu
MAIL=/var/mail/ec2-user
OPTERR=1
OPTIND=1
OSTYPE=linux-gnu
PATH=/usr/local/bin:/bin:/usr/bin:/opt/aws/bin
PIPESTATUS=([0]="0")
PPID=26980
PS4='+ '
PWD=/home/ec2-user
SHELL=/bin/bash
SHELLOPTS=braceexpand:hashall:interactive-comments
SHLVL=1
SSH_CLIENT='x.x.x.x 55306 22'
SSH_CONNECTION='x.x.x.x 55306 x.x.x.x 22'
TERM=dumb
UID=500
USER=ec2-user
_=/etc/bashrc
[01/11/19 10:20:16] [SSH] Starting sftp client.
[01/11/19 10:20:16] [SSH] Remote file system root /tmp/jenkins-d663ed22 does not exist. Will try to create it...
[01/11/19 10:20:16] [SSH] Copying latest remoting.jar...
[01/11/19 10:20:17] [SSH] Copied 776,717 bytes.
Expanded the channel window size to 4MB
[01/11/19 10:20:17] [SSH] Starting agent process: cd "/tmp/jenkins-d663ed22" && /usr/bin/java  -jar remoting.jar -workDir /tmp/jenkins-d663ed22
Exception in thread "main" java.lang.UnsupportedClassVersionError: hudson/remoting/Launcher : Unsupported major.minor version 52.0
	at java.lang.ClassLoader.defineClass1(Native Method)
	at java.lang.ClassLoader.defineClass(ClassLoader.java:808)
	at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
	at java.net.URLClassLoader.defineClass(URLClassLoader.java:443)
	at java.net.URLClassLoader.access$100(URLClassLoader.java:65)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:349)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:348)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:430)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:323)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:363)
	at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:482)
Slave JVM has terminated. Exit code=1
[01/11/19 10:20:17] Launch failed - cleaning up connection
[01/11/19 10:20:17] [SSH] Connection closed.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.