pepperize / cdk-autoscaling-gitlab-runner Goto Github PK
View Code? Open in Web Editor NEWExecute Gitlab jobs on auto-scaled EC2 instances using the Docker Machine executor.
License: MIT License
Execute Gitlab jobs on auto-scaled EC2 instances using the Docker Machine executor.
License: MIT License
Hello!
I hope you are doing well!
We are a security research team. Our tool automatically detected a vulnerability in this repository. We want to disclose it responsibly. GitHub has a feature called Private vulnerability reporting, which enables security research to privately disclose a vulnerability. Unfortunately, it is not enabled for this repository.
Can you enable it, so that we can report it?
Thanks in advance!
PS: you can read about how to enable private vulnerability reporting here: https://docs.github.com/en/code-security/security-advisories/repository-security-advisories/configuring-private-vulnerability-reporting-for-a-repository
This policy (AmazonEC2RoleforSSM
) will soon be deprecated. Please use AmazonSSMManagedInstanceCore
policy to enable AWS Systems Manager service core functionality on EC2 instances.
@danielbayerlein I open this to track until we get all follow-up errors resolved
Since yesterday the GitLab Runner is not working anymore, because there was an update at Docker.
With Docker 23.0.0 release (which happened today), installation of Docker doesn't create an /etc/docker directory anymore by default.
The latest docker-machine version (
v0.16.2-gitlab.19
) solves this issue.https://gitlab.com/gitlab-org/ci-cd/docker-machine/-/merge_requests/102 https://gitlab.com/gitlab-org/gitlab-runner/-/issues/29594 https://gitlab.com/gitlab-org/gitlab-runner/-/issues/29593
relates to #533
After attempting to deploy a zero config Stack to GovCloud, I found that the runners were failing to be created due to an IAM issue. Here's a sanitized snippet from /var/log/gitlab-runner.log
:
Jan 4 20:10:30 ip-REDACTED gitlab-runner: #33[31;1mERROR: Error creating machine: Error in driver during machine creation: Error request spot instance: UnauthorizedOperation: You are not authorized to perform this operation. User: arn:aws-us-gov:sts::REDACTED:assumed-role/GitLabRunnerStack-GitlabRunnerManagerRole2F9BC927-REDACTED/i-REDACTED is not authorized to perform: ec2:RequestSpotInstances on resource: arn:aws-us-gov:ec2:us-gov-west-1:REDACTED:subnet/subnet-REDACTED because no identity-based policy allows the ec2:RequestSpotInstances action.
Deeper inspection found the culprit at
Whereas thearn:aws
prefix is hard-coded, the actual GovCloud ARN prefix is going to be arn:aws-us-gov
.Currently we are not able to deploy the Stack. Did not face this problem before.
Deployment error:
CustomResource attribute error: Vendor response doesn't contain SecurityGroups.0.GroupName key in object arn:aws:cloudformation:***:stack/GitlabRunnerStack/***RunnerRunnersSecurityGroupDescribeSGCustomResource*** in S3 bucket cloudformation-custom-resource-storage-***
We use the latest versions of AWS CDK and @pepperize packages:
"aws-cdk": "^2.73.0",
"aws-cdk-lib": "^2.73.0",
"@pepperize/cdk-autoscaling-gitlab-runner": "^0.2.456",
"@pepperize/cdk-private-bucket": "^0.0.360",
"@pepperize/cdk-security-group": "^0.0.433",
"@pepperize/cdk-vpc": "^0.0.575",
The logs of the Lambda which runs the describeSecurityGroups
command look like this. Looks like the response data
object is empty to me.
{
"Status": "SUCCESS",
"Reason": "OK",
"PhysicalResourceId": "sg-***",
"StackId": "arn:aws:cloudformation:e***:stack/GitlabRunnerStack/***",
"RequestId": "***",
"LogicalResourceId": "RunnerRunnersSecurityGroupDescribeSGCustomResource***",
"NoEcho": false,
"Data": {}
}
According to the documentation it is possible to specify a custom keyPairName
which is applied to the Manager Instance and the Auto Scaling Group.
The launched Runner Instances still use new created Key Pairs. But for compliance reasons, sometimes it is only possible to use imported key pairs.
Is there a way to also set a keyPairName
for the Runner Instances?
Currently, the runners get a public IP address. I would like to disable this so that the communication goes over the NAT gateway. My current configuration looks like this:
export class GitlabRunnerStack extends Stack {
constructor(scope: Construct, id: string, props: GitlabRunnerStackProps) {
super(scope, id, props);
const { gitlabToken, gitlabUrl, cidr } = props;
const token = new aws_ssm.StringParameter(this, 'Token', {
parameterName: '/gitlab-runner/token',
stringValue: gitlabToken,
type: aws_ssm.ParameterType.STRING,
tier: aws_ssm.ParameterTier.STANDARD,
});
const vpc = new aws_ec2.Vpc(this, 'Vpc', {
cidr,
natGateways: 1,
subnetConfiguration: [
{
name: 'Public',
subnetType: aws_ec2.SubnetType.PUBLIC,
mapPublicIpOnLaunch: false,
},
{
name: 'Private',
subnetType: aws_ec2.SubnetType.PRIVATE_WITH_NAT,
},
]
});
new GitlabRunnerAutoscaling(this, 'Runner', {
runners: [
{
instanceType: aws_ec2.InstanceType.of(
aws_ec2.InstanceClass.T3,
aws_ec2.InstanceSize.MEDIUM,
),
token: token,
configuration: {
url: gitlabUrl,
machine: {
machineOptions: {
spotPrice: 0.04,
},
},
},
},
],
network: { vpc: vpc },
});
}
}
Hi, I created a stack with the following code:
const keyPair = aws_secretsmanager.Secret.fromSecretNameV2(this, 'SshKeyPair', 'SshKeyPair');
new GitlabRunnerAutoscaling(this, 'GitlabRunner', {
network: {
vpc: vpc,
},
manager: {
keyPairName: 'SshKeyPair', // assume there is already the ec2 key pair "SshKeyPair" created beforehand
},
runners: [
{
keyPair: keyPair,
instanceType: aws_ec2.InstanceType.of(aws_ec2.InstanceClass.T3, aws_ec2.InstanceSize.MEDIUM),
token: xxx,
configuration: {
machine: {
machineOptions: {
keypairName: 'theKeyPairName',
},
},
},
},
],
});
inside the AWS secrets manager, i already created the "SshKeyPair" secret with 2 keys:
"theKeyPairName": "-----BEGIN RSA PRIVATE KEY----- xxx ",
"theKeyPairName.pub": "-----BEGIN PUBLIC KEY----- xxx"
but when deploying the CDK stack, the deployment failed because: Received 1 FAILURE signal(s) out of 1. Unable to satisfy 100% MinSuccessfulInstancesPercent requirement
on the manager instance ASG.
since this is related to the cfn-init script, i checked the logs by ssh into the manager instance and run cat /var/log/messages | grep cloud-init
, and found the error message:
[DEBUG] Command 999-retrieve-ec2-key-pair output: /bin/sh: line 1: -----BEGIN: command not found
and it seems like the cfn-init script does a shell command substitution "$()" over the secret value? based on this line: f5a173f#diff-38c267fcf5e98b1bf0a4bc4c84a8b1f97d08aac16be65b4908e6c3de8616dfcfR328
then what am i supposed to put inside the secret value? any help would be appreciated ๐
During stack synth:
[WARNING] aws-cdk-lib.aws_ec2.MachineImage#latestAmazonLinux is deprecated.
use MachineImage.latestAmazonLinux2 instead
This API will be removed in the next major release.
Searching for AMI in 753334223818:us-east-1
[WARNING] aws-cdk-lib.aws_ec2.MachineImage#latestAmazonLinux is deprecated.
use MachineImage.latestAmazonLinux2 instead
This API will be removed in the next major release.
As an enterprise user that is working behind an internet proxy, I would like to be able to pass environment variables that are used during the initialization of the runner to the constructs so that I can correctly configure the proxy URL and no-proxy list.
I would like to be able to access AWS ECR repository from gitlab runner. i.e. specify private AWS ECR repository as image
for gitlab CI job. In order to do so the amazon-ecr-credential-helper has to be installed on runner machine.
Currently I do not see way to customise the runner machine (not the agent).
Poking around code I think it should go here
InitPackage.yum("amazon-ecr-credential-helper"),
Hi,
I tried to pull Job images from private ECR to, however, the job fails to pull the image from ECR with the following error:
ERROR: Job failed: failed to pull image ... with specified policies [always]: Error response from daemon: Head ...: no basic auth credentials (manager.go:237:0s)
The runner role has permission to pull every image from my account, just for testing purposes.
Moreover, I tried to different configurations for the docker environment. Nothing helped.
It appears that multiple ppl struggle with GL runners and ecr-login to use images from ECR for jobs.
i tried to follow various guides, Tipps and tricks, including this one: https://gist.github.com/dreampuf/a0d416a15299a2ac74a0a5cb8f2871c0
Unfortunately, there is no registration of the GitLab Runner. In GitLab I only see the message New runner, has not contacted yet
. Do you have any idea what the problem is?
import { GitlabRunnerAutoscaling } from '@pepperize/cdk-autoscaling-gitlab-runner';
import { aws_ec2, aws_iam, aws_ssm, Stack, StackProps } from 'aws-cdk-lib';
import { Construct } from 'constructs';
interface GitlabRunnerStackProps extends StackProps {
readonly gitlabToken: string;
readonly gitlabUrl: string;
readonly cidr: string;
}
export class GitlabRunnerStack extends Stack {
constructor(scope: Construct, id: string, props: GitlabRunnerStackProps) {
super(scope, id, props);
const { gitlabToken, gitlabUrl, cidr } = props;
const token = new aws_ssm.StringParameter(this, 'Token', {
parameterName: '/gitlab-runner/token',
stringValue: gitlabToken,
type: aws_ssm.ParameterType.STRING,
tier: aws_ssm.ParameterTier.STANDARD,
});
const vpc = new aws_ec2.Vpc(this, 'Vpc', {
cidr,
natGateways: 1,
});
new GitlabRunnerAutoscaling(this, 'Runner', {
runners: [
{
instanceType: aws_ec2.InstanceType.of(
aws_ec2.InstanceClass.M6G,
aws_ec2.InstanceSize.MEDIUM,
),
token: token,
role: new aws_iam.Role(this, 'Role', {
assumedBy: new aws_iam.ServicePrincipal('ec2.amazonaws.com'),
managedPolicies: [
aws_iam.ManagedPolicy.fromAwsManagedPolicyName(
'AmazonSSMManagedInstanceCore',
),
],
}),
configuration: {
name: 'gitlab-runner',
url: gitlabUrl,
},
},
],
network: { vpc: vpc },
});
}
}
With the latest release (v0.2.361), runner instances can not be started anymore.
Error message:
Error creating machine: Error in driver during machine creation: Error launching instance: UnauthorizedOperation: You are not authorized to perform this operation.
Runner Manager Role:
Seems like the tag InstanceProfile
is missing, because when I delete the condition from the role it works.
GitLab is moving to 15.0 with a few breaking changes: https://about.gitlab.com/blog/2022/04/18/gitlab-releases-15-breaking-changes
The following change will probably affect the package: Must explicitly assign AuthenticationType for [runners.cache.s3]
. The AuthenticationType
must be IAM
or credentials
.
I was using the latest release that includes ecr credential helper.
Occasionally this fail in spectacular manner. It start spinning up VM's and those fail and new is spinning up in loop. at one point deleting the VM's was not happening and I ended up with 8 quite large VM's, before it hit my resource limit. Luckily I notice it early or it would be ๐ฐ๐ฐ๐ฐ๐ฐ :)
I spend some time investigating the issue. Turns out it fail at the apt-get update
step. sorry I do not have the logs handy.
I found entries in the actual runner (not the manager) logs complaining about the apt lock failed. This is know issue using ubuntu like system. When ubuntu starts the apt-daily.service
ran very early in power up sequence. If you unlucky the apt-get update
or apt-get install ...
will conflict as those can not run concurrently and using file based lock.
Not sure how to solve it, there does not seems to be universal solution. Best description I found: https://saveriomiroddi.github.io/Handling-the-apt-lock-on-ubuntu-server-installations/
For now can we make the installation of amazon-ecr-credential-helper optional?
Deploying two GitlabRunnerAutoscaling
constructs in the same CDK project (for example one with runners for light workloads and one for heavy workloads) is currently not possible. Every GitlabRunnerAutoscaling
instantiates a new Network
construct called "Network", a new SecurityGroup
called "ManagerSecurityGroup", ... and those names will clash.
Can this Network not be shared between two GitlabRunnerAutoscaling
constructs, just like for the cache bucket?
When I deploy an instance of the runner stack, the manager fails to create runners with an error log like this:
Running pre-create checks... driver=amazonec2 name=runner-abcdef-gitlab-runner-01-02 operation=create
Creating machine... driver=amazonec2 name=runner-abcdef-gitlab-runner-01-02 operation=create
(runner-abcdef-gitlab-runner-01-02) Launching instance... driver=amazonec2 name=runner-abcdef-gitlab-runner-01-02 operation=create
(runner-abcdef-gitlab-runner-01-02) Missing instance ID, this is likely due to a failure during machine creation driver=amazonec2 name=runner-abcdef-gitlab-runner-01-02 operation=create
(runner-abcdef-gitlab-runner-01-02) Missing key pair name, this is likely due to a failure during machine creation driver=amazonec2 name=runner-abcdef-gitlab-runner-01-02 operation=create
IdleCount is set to 0 so the machine will be created on demand in job context creating=1 idle=0 idleCount=0 idleCountMin=0 idleScaleFactor=0 maxMachineCreate=0 maxMachines=10 removing=0 runner=Abcdef total=1 used=0
ERROR: Error creating machine: Error in driver during machine creation: unable to create key pair: open /etc/gitlab-runner/ssh: no such file or directory driver=amazonec2 name=runner-abcdef-gitlab-runner-01-02 operation=create
ERROR: Machine creation failed error=exit status 1 name=runner-abcdef-gitlab-runner-01-02 time=5.644731027s
The stack looks like this:
new GitlabRunnerAutoscaling(this, "GitlabRunnerAutoscaling", {
network: {
vpc: vpc,
},
runners: [
{
instanceType: ec2.InstanceType.of(
ec2.InstanceClass.T3,
ec2.InstanceSize.SMALL,
),
token: token,
role: runnerExecutionRole,
configuration: {
machine: {
machineOptions: {
requestSpotInstance: false,
},
},
},
},
],
});
If I set configuration.machine.sshKeyPath
to ""
, the manager is able to create the runner just fine.
I'm using @pepperize/cdk-autoscaling-gitlab-runner@^0.2.111
with aws-cdk@^2.30.0
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.