engineerbetter / control-tower Goto Github PK

View Code? Open in Web Editor NEW

121.0 6.0 37.0 49.48 MB

Deploy and operate Concourse CI in a single command

Home Page: https://www.engineerbetter.com

License: Apache License 2.0

Go 83.22% Shell 9.70% Ruby 2.55% HCL 4.47% Makefile 0.05%

concourse-ci operations terraform bosh continuous-delivery

control-tower's Issues

Suggestions for improvements in documentation

Maybe it's due to me being a newb on GCP but I had some struggle figuring out, which APIs I have to activate for Control-Tower to spin up a Concourse setup on GCP. So maybe the following list (that I came up with) would be worth mentioning in the documentation:

gcloud services enable compute.googleapis.com
gcloud services enable iam.googleapis.com
gcloud services enable cloudresourcemanager.googleapis.com
gcloud services enable sqladmin.googleapis.com

Should prompt before creation

I tried to use "deploy" to update a control-tower CI and got the name wrong.
It went ahead and created a new one without prompting for confirmation.

I think it should prompt before doing that.

Wrong certificate IP

I'm failing on bosh deployment with the error below (Sorry, had to replace the public IPs with dummy values but the last bit is reflected correctly)

Deploying:
  Creating instance 'bosh/0':
    Waiting until instance is ready:
      Post https://mbus:<redacted>@11.11.11.110:6868/agent: x509: certificate is valid for 10.0.0.6, 11.11.11.4, not 11.11.11.110

It appears that the concourse certificate that was created expects a specific public IP which is not the same as the one created.

Self-update fails due to missing 'iaas'-flag

Cheers!
The self-update-job in the control-tower-self-update-pipeline fails due to a missing iaas flag:

+ cd control-tower-release
+ chmod +x control-tower-linux-amd64
+ ./control-tower-linux-amd64 deploy concourse-test
Error validating args on deploy: [failed to validate Deploy flags: [--iaas flag not set]]

Concourse was deployed via
./control-tower-darwin-amd64 deploy --github-auth-client-id <client-id> --github-auth-client-secret <client-secret> --iaas AWS --domain <domain> --region eu-west-1 concourse-test.
Version: 0.3.1

Seems like the aws_pipeline (and the gcp_pipeline as well) is missing a IAAS parameter.
I was also wondering whether you could tell me if the deploy command in the self-update-job is set up so that it retains configuration from the previous deployment (e.g. github-auth-client-id). Thanks!

Auto update pipeline fails

When running the auto update pipeline, we get the following error:

`+ cd control-tower-release

chmod +x control-tower-linux-amd64
./control-tower-linux-amd64 deploy concourse-id
error ensuring config bucket exists before deploy: [error determining if bucket [] exists: [InvalidParameter: 1 validation error(s) found.

minimum field size of 1, HeadBucketInput.Bucket.
]]`

100% CPU reported by grafana concourse dashboard with no reason

Grafana concourse dashboard reports 100% CPU for some time now (was the same with concourse-up), and it does not change.
I've checked in AWS web, workers, bosh EC2 instances, RDS as well, and all of them are doing more than fine (usually bellow 40% at peak) according to AWS monitoring.
I've ssh to a web node with bosh: htop reports cpu varying between 10-50 at average, rearly hitting 100%. Load is now: 2.20 2.16 2.11.
I have a default setup with 2 workers in AWS. On latest version, with concourse 5.2.0.
Otherwise everything is working as expected, it is just a bit annoying.

No dashboard available in grafana

After deploying a new concourse build using the latest version of control tower, we have no dashboard in grafana.

Concourse workers in multiple regions AWS cloud

Is it possible to deploy concourse workers in multiple regions having concourse-web in one region in AWS cloud?

Concourse UI certificate expired "pinned version is not available"

The HTTPS certificate on my control-tower Concourse UI is expired.
It looks like there is a job which ought to auto-renew this called "renew-https-cert", but it is hanging with the following output:

[cog] preparing build
[tick] checking pipeline is not paused
[tick] checking job is not paused
[spins forever] discovering any new versions of control-tower-release
[tick] discovering any new versions of every-day
[spins forever] waiting for a suitable set of input versions
* control-tower-release - pinned version {"tag":"0.8.1"} is not available
[spins forever] checking max-in-flight is not reached

Can anyone help me understand what's wrong here?
What does the "pinned version is not available" message mean? How can I fix?

After I fix that, will the permanently spinning task "discovering any new versions of control-tower-release" unblock, or do I have two problems?

Thanks!

Rich

Add support for adding an IAM role to workers

We are using IAM roles on our worker nodes for granting access to an S3 bucket where other terraform statefiles are defined. Add support for adding an IAM role to the worker nodes

Web loading is extremely slow

Hello there!

We use control-tower to deploy our concourse instance to AWS and we absolutely love it.
However, as we add more jobs and more pipelines, we are experiencing super slow page load times.

We're currently deploying with these flags:
--iaas aws
--region us-east-1
--workers 4
--worker-type m5
--worker-size 4xlarge
--web-size 2xlarge

And even though the web-size is 2xlarge, it's still very slow (3-6s page load times). From looking in the network tab, this is mostly coming from the "pipelines" and "jobs" calls. We could split pipelines out to separate teams, but since we're a fairly small company (100ish engineers) we appreciate the pipeline visibility, especially during on-call rotations where quickly redeploying a last known version is helpful. We could also start spinning up new control-tower concourse deployments to various sub-domains, but that's a little annoying from a management perspective.

We're wondering if you have any insight into this, or if you are planning on bumping up the options for maximum web node size (t3's would be particularly nice, but beefier instances would be great, too), or maybe it's just time for us to figure out the BOSH deployment on our own :)

Thanks for your help!

Tyler Beebe
Software Engineer
Meetup

Deployment on AWS fails on ruby2.4 compilation

While running installation, I am failing on ruby2.4 compilation.

I am using Control Tower 0.3.1.

AWS_ACCESS_KEY_ID=<key> \
AWS_SECRET_ACCESS_KEY=<secret> \
control-tower deploy \
    --region eu-west-2 \
    --iaas aws \
    --workers 3 \
    ${PCF_SUBDOMAIN_NAME}

This is the output (streamed from BOSH it appears):

Started validating
  Downloading release 'bosh'... Finished (00:00:19)
  Validating release 'bosh'... Finished (00:00:01)
  Downloading release 'bpm'... Finished (00:00:23)
  Validating release 'bpm'... Finished (00:00:01)
  Downloading release 'bosh-aws-cpi'... Finished (00:00:01)
  Validating release 'bosh-aws-cpi'... Finished (00:00:00)
  Validating cpi release... Finished (00:00:00)
  Validating deployment manifest... Finished (00:00:00)
  Downloading stemcell... Finished (00:00:01)
  Validating stemcell... Finished (00:00:00)
Finished validating (00:00:49)

Started installing CPI
  Compiling package 'ruby-2.4-r4/0cdc60ed7fdb326e605479e9275346200af30a25'... Failed (00:01:45)
Failed installing CPI (00:01:45)

Installing CPI:
  Compiling job package dependencies for installation:
    Compiling job package dependencies:
      Compiling package:
        Running command: 'bash -x packaging', stdout: 'checking for a BSD-compatible install... /usr/bin/install -c
...
...
...
+ make install
+ tar xzf ruby-2.4.4.tar.gz
+ set -e
+ cd ruby-2.4.4
++ uname -m
+ '[' x86_64 == ppc64le ']'
+ LDFLAGS='-Wl,-rpath -Wl,/home/ubuntu/.bosh/installations/30180f1d-f2c0-43ec-5256-2500ad164e98/packages/ruby-2.4-r4'
+ CFLAGS=-fPIC
+ ./configure --prefix=/home/ubuntu/.bosh/installations/30180f1d-f2c0-43ec-5256-2500ad164e98/packages/ruby-2.4-r4 --disable-install-doc --with-opt-dir=/usr/local/opt/openssl:/home/ubuntu/.bosh/installations/30180f1d-f2c0-43ec-5256-2500ad164e98/packages/ruby-2.4-r4 --without-gmp
+ make
ar: `u' modifier ignored since `D' is the default (see `U')
ar: `u' modifier ignored since `D' is the default (see `U')
+ make install
+ mkdir /home/ubuntu/.bosh/installations/30180f1d-f2c0-43ec-5256-2500ad164e98/packages/ruby-2.4-r4/bosh
+ cp gemrc /home/ubuntu/.bosh/installations/30180f1d-f2c0-43ec-5256-2500ad164e98/packages/ruby-2.4-r4/bosh/gemrc
+ cp compile.env /home/ubuntu/.bosh/installations/30180f1d-f2c0-43ec-5256-2500ad164e98/packages/ruby-2.4-r4/bosh/compile.env
+ cp runtime.env /home/ubuntu/.bosh/installations/30180f1d-f2c0-43ec-5256-2500ad164e98/packages/ruby-2.4-r4/bosh/runtime.env
+ source /home/ubuntu/.bosh/installations/30180f1d-f2c0-43ec-5256-2500ad164e98/packages/ruby-2.4-r4/bosh/runtime.env
++ export PATH=/home/ubuntu/.bosh/installations/30180f1d-f2c0-43ec-5256-2500ad164e98/packages/ruby-2.4-r4/bin:/usr/local/bin:/usr/bin:/bin
++ PATH=/home/ubuntu/.bosh/installations/30180f1d-f2c0-43ec-5256-2500ad164e98/packages/ruby-2.4-r4/bin:/usr/local/bin:/usr/bin:/bin
++ export GEMRC=/home/ubuntu/.bosh/installations/30180f1d-f2c0-43ec-5256-2500ad164e98/packages/ruby-2.4-r4/bosh/gemrc
++ GEMRC=/home/ubuntu/.bosh/installations/30180f1d-f2c0-43ec-5256-2500ad164e98/packages/ruby-2.4-r4/bosh/gemrc
+ tar zxvf rubygems-2.7.6.tgz
+ set -e
+ cd rubygems-2.7.6
+ ruby setup.rb --no-ri --no-rdoc
/home/ubuntu/.bosh/installations/30180f1d-f2c0-43ec-5256-2500ad164e98/tmp/bosh-release-pkg246764010/rubygems-2.7.6/lib/rubygems/core_ext/kernel_require.rb:59:in `require': cannot load such file -- zlib (LoadError)
	from /home/ubuntu/.bosh/installations/30180f1d-f2c0-43ec-5256-2500ad164e98/tmp/bosh-release-pkg246764010/rubygems-2.7.6/lib/rubygems/core_ext/kernel_require.rb:59:in `require'
	from /home/ubuntu/.bosh/installations/30180f1d-f2c0-43ec-5256-2500ad164e98/tmp/bosh-release-pkg246764010/rubygems-2.7.6/lib/rubygems/package.rb:47:in `<top (required)>'
	from /home/ubuntu/.bosh/installations/30180f1d-f2c0-43ec-5256-2500ad164e98/tmp/bosh-release-pkg246764010/rubygems-2.7.6/lib/rubygems/core_ext/kernel_require.rb:59:in `require'
	from /home/ubuntu/.bosh/installations/30180f1d-f2c0-43ec-5256-2500ad164e98/tmp/bosh-release-pkg246764010/rubygems-2.7.6/lib/rubygems/core_ext/kernel_require.rb:59:in `require'
	from /home/ubuntu/.bosh/installations/30180f1d-f2c0-43ec-5256-2500ad164e98/tmp/bosh-release-pkg246764010/rubygems-2.7.6/lib/rubygems/commands/pristine_command.rb:3:in `<top (required)>'
	from /home/ubuntu/.bosh/installations/30180f1d-f2c0-43ec-5256-2500ad164e98/tmp/bosh-release-pkg246764010/rubygems-2.7.6/lib/rubygems/core_ext/kernel_require.rb:59:in `require'
	from /home/ubuntu/.bosh/installations/30180f1d-f2c0-43ec-5256-2500ad164e98/tmp/bosh-release-pkg246764010/rubygems-2.7.6/lib/rubygems/core_ext/kernel_require.rb:59:in `require'
	from /home/ubuntu/.bosh/installations/30180f1d-f2c0-43ec-5256-2500ad164e98/tmp/bosh-release-pkg246764010/rubygems-2.7.6/lib/rubygems/commands/setup_command.rb:583:in `regenerate_binstubs'
	from /home/ubuntu/.bosh/installations/30180f1d-f2c0-43ec-5256-2500ad164e98/tmp/bosh-release-pkg246764010/rubygems-2.7.6/lib/rubygems/commands/setup_command.rb:155:in `execute'
	from /home/ubuntu/.bosh/installations/30180f1d-f2c0-43ec-5256-2500ad164e98/tmp/bosh-release-pkg246764010/rubygems-2.7.6/lib/rubygems/command.rb:313:in `invoke_with_build_args'
	from /home/ubuntu/.bosh/installations/30180f1d-f2c0-43ec-5256-2500ad164e98/tmp/bosh-release-pkg246764010/rubygems-2.7.6/lib/rubygems/command_manager.rb:171:in `process_args'
	from /home/ubuntu/.bosh/installations/30180f1d-f2c0-43ec-5256-2500ad164e98/tmp/bosh-release-pkg246764010/rubygems-2.7.6/lib/rubygems/command_manager.rb:141:in `run'
	from /home/ubuntu/.bosh/installations/30180f1d-f2c0-43ec-5256-2500ad164e98/tmp/bosh-release-pkg246764010/rubygems-2.7.6/lib/rubygems/gem_runner.rb:59:in `run'
	from setup.rb:46:in `<main>'
+ echo 'Cannot install rubygems'
+ exit 1
':
          exit status 1

Exit code 1

Add Support for AWS Sub-Account roles

AWS Account credentials (access_key_id and secret_access_key) can be supplemented with a role arn to provide access to sub-accounts in AWS. If Control Tower is deployed to one of these sub-accounts, the self-update job fails because it can not find the S3 bucket. Can we add support for this account management feature?

Support for specifying the fly target name when deploying

Enable audit logs

Concourse 5.2.0 added support for auditing. https://concourse-ci.org/concourse-web.html#audit-logs

Add an option to enable this auditing feature from the deploy + ship these logs to cloudwatch logs

control-tower does not seem to respect --zone on GCP

When creating a new deployment with control-tower, I specified a zone of us-central1-a. All VMs that were created were in us-central1-b, and the deployment ultimately failed. Changing the zone to us-central1-b resulted in a successful deployment.

`--domain` flag didn't work

I am trying to deploy Concourse with a domain name

I ran:

control-tower deploy --iaas AWS myci.example.com --domain myci.example.com

The deploy succeeded, but it has set "concourse-url" to an ip address.
Concourse is reachable at that address.

I would like it to use myci.example.com instead.

There are no obvious errors in the output.

How can I debug?

Thanks,

Rich

why can't I find my control-tower generated credentials in credhub

First of all thanks a lot for your tooling - I tried it out yesterday on GCP and after a few issues concerning my lack of knowledge regarding GCP, it worked like a charm.

One thing I was wondering about was that I couldn't find my credentials (e.g. Concourse password) in credhub after spinning up the setup on GCP:

$ eval "$(control-tower info --region europe-west1  --iaas gcp --env concourse)"
$ credhub find
credentials: []

Is this the intended behavior? If yes, why don't you store all generated credentials of the BOSH deployment in credhub - or am I missing something?

Thanks in advance!

GCP: --domain needs to reflect the FQDN and not the DNS zone

Hello,
the current documentation states:

control-tower deploy --domain chimichanga.engineerbetter.com chimichanga
In the example above control-tower will search for a hosted zone that matches chimichanga.engineerbetter.com or engineerbetter.com and add a record to the longest match (chimichanga.engineerbetter.com in this example).

In GCP, if I create a DNS zone pointing to chimichanga.engineerbetter.com, then control-tower will actually create a record such as:

     chimichanga.engineerbetter.com.chimichanga.engineerbetter.com

It works ok if the DNS zone corresponds to engineerbetter.com. It will then add a record such as:

     chimichanga.engineerbetter.com

i/o timeout' in 'create_vm' CPI method

Seeing a few cases of the following error with Control Tower v0.3.0. Does anyone have anything to offer?

Task 10 | 12:28:32 | Error: CPI error 'Bosh::Clouds::CloudError' with message 'Creating vm:
 Failed to find Google Image 'stemcell-e5d99deb-c5b4-4f5f-53ad-87ef7e71d15a': Get 
https://www.googleapis.com/compute/v1/projects/ps-amcginlay/global/images/
stemcell-e5d99deb-c5b4-4f5f-53ad-87ef7e71d15a?alt=json: oauth2: cannot fetch token: 
Post https://accounts.google.com/o/oauth2/token: dial tcp 108.177.111.84:443: i/o timeout' 
in 'create_vm' CPI method (CPI request ID: 'cpi-653711')

Deployment in AWS us-east-1 failing - vm for bosh/0 timing out

Summary

Cannot complete installation on AWS us-east-1 region. It is failing with the following error message:

Started deploying
  Creating VM for instance 'bosh/0' from stemcell 'ami-0a516ac768a92a3b2 light'... Finished (00:00:38)
  Waiting for the agent on VM 'i-01a16381f3c676634' to be ready... Failed (00:06:20)
Failed deploying (00:06:58)

Stopping registry... Finished (00:00:00)
Cleaning up rendered CPI jobs... Finished (00:00:00)

Deploying:
  Creating instance 'bosh/0':
    Waiting until instance is ready:
      Starting SSH tunnel:
        Starting SSH tunnel:
          Failed to connect to remote server:
            dial tcp 3.216.195.235:22: connect: operation timed out

Exit code 1
exit status 1

Misc. Info:

AWS keys have admin privileges
Command being executed is control-tower deploy --iaas aws --region us-east-1 example-name (and destroy)
[13:53:38] mikeyvxt:control-tower git:(master*) $ control-tower -v Control-Tower version 0.8.2
Deploying from: Apple Macbook 15", 2018. macOS Mojave 10.14.6

Steps to Recreate

I ran control-tower for the first time several days ago, and it did work. But for various reasons I needed to destroy the installation.
Reran the deployment command as above today using a different project name and started encountering this error.
Tried for multiple different AWS accounts and same error happens on all of them.

Add support for T3 instances

Add support for running concourse on T3 instances. T3 are more powerful and cheaper than T2 instances.

Error during deploy: "panic: runtime error: invalid memory address or nil pointer dereference"

Hi team,

Trying to perform a deploy using the following command:

AWS_ACCESS_KEY_ID=<redacted>  \
AWS_SECRET_ACCESS_KEY=<redacted> \
./control-tower-darwin-amd64 deploy --iaas aws --region ap-southeast-2 ci-test

I am receiving this error after the deploy starts

Started deploying
  Waiting for the agent on VM '<redacted>'... Finished (00:00:00)
  Stopping jobs on instance 'unknown/0'... Finished (00:00:00)
  Unmounting disk 'vol-<redacted>'... Finished (00:00:05)
  Deleting VM '<redacted>'... Finished (00:00:34)
  Creating VM for instance 'bosh/0' from stemcell 'ami-<redacted> light'... Finished (00:00:40)
  Waiting for the agent on VM 'i-<redacted>' to be ready... Finished (00:00:31)
  Attaching disk 'vol-<redacted>' to VM 'i-<redacted>'... Finished (00:00:22)
  Rendering job templates... Finished (00:00:09)
  Compiling package 'ruby-2.4-r5/726cbb2214e138b576700db6a30698edb2b994e2'... Skipped [Package already compiled] (00:00:16)
  Compiling package 'bpm-runc/c0b41921c5063378870a7c8867c6dc1aa84e7d85'... Skipped [Package already compiled] (00:00:16)
  Compiling package 'golang/27413c6b5a88ea20a24a9eed74d4b090b7b88331'...

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x106b4f3]

goroutine 629 [running]:
io.copyBuffer(0x1bfdd00, 0xc420446210, 0x0, 0x0, 0xc4205a2000, 0x8000, 0x8000, 0xc420446210, 0xc420446210, 0x19cb9c0)
	/usr/local/go/src/io/io.go:400 +0x143
io.Copy(0x1bfdd00, 0xc420446210, 0x0, 0x0, 0x0, 0x0, 0x0)
	/usr/local/go/src/io/io.go:362 +0x5a
net.genericReadFrom(0x1bfc140, 0xc42000c090, 0x0, 0x0, 0x0, 0x0, 0x0)
	/usr/local/go/src/net/net.go:597 +0x84
net.(*TCPConn).readFrom(0xc42000c090, 0x0, 0x0, 0xc420598e08, 0x100e04d, 0x1998080)
	/usr/local/go/src/net/tcpsock_posix.go:51 +0x4d
net.(*TCPConn).ReadFrom(0xc42000c090, 0x0, 0x0, 0x240ac10, 0xc42000c090, 0xc42001c001)
	/usr/local/go/src/net/tcpsock.go:103 +0x5f
io.copyBuffer(0x1bfc140, 0xc42000c090, 0x0, 0x0, 0x0, 0x0, 0x0, 0x1a9ff80, 0x0, 0x1bfc140)
	/usr/local/go/src/io/io.go:386 +0x31a
io.Copy(0x1bfc140, 0xc42000c090, 0x0, 0x0, 0x0, 0x0, 0x0)
	/usr/local/go/src/io/io.go:362 +0x5a
github.com/cloudfoundry/bosh-cli/deployment/sshtunnel.(*sshTunnel).Start.func3(0x1c0bfa0, 0xc42000c090, 0x0, 0x0, 0xc4203989b0, 0xc420354480)
	/tmp/build/a9832f70/gopath/src/github.com/cloudfoundry/bosh-cli/deployment/sshtunnel/ssh_tunnel.go:85 +0xc3
created by github.com/cloudfoundry/bosh-cli/deployment/sshtunnel.(*sshTunnel).Start
	/tmp/build/a9832f70/gopath/src/github.com/cloudfoundry/bosh-cli/deployment/sshtunnel/ssh_tunnel.go:84 +0x314

I am running version 0.8.2 of Control Tower:

./control-tower-darwin-amd64 --version
Control-Tower version 0.8.2

and bosh-cli version 6.1.0-9:

bosh --version
version 6.1.0-9c1c210c-2019-09-18T17:33:48Z

Succeeded

Any ideas about how to troubleshoot it?

When passing a certificate on deployment, credhub still uses a self-generated cert

I have a control-tower deployment running on GCP.

Concourse got deployed to https://concourse.devops.hubau.cloud correctly, with the valid Letsencrypt certificate that I passed
Grafana uses the same certificate at https://concourse.devops.hubau.cloud:3000
Credhub uses a self-generated certificate which fails my pipelines: https://concourse.devops.hubau.cloud:8844

Get shell on the worker instance?

Hi,

Do you know how I can ssh into the worker instance?
I am having "too many open files" and out of memory issues during my builds and I want to investigate.

Thanks,

Rich

Error while upgrading an environment

Because the built-in pipeline hangs i tried upgrading the environment from the cli:
/control-tower-darwin-amd64 deploy --iaas aws --region eu-west-1 concourse-id

But i get the following error:

  Creating VM for instance 'bosh/0' from stemcell 'ami-0a8dd0a21930bc6ab light'... Failed (00:09:51)
Failed deploying (00:09:51)

Stopping registry... Finished (00:00:00)
Cleaning up rendered CPI jobs... Finished (00:00:00)

Deploying:
  Creating instance 'bosh/0':
    Creating VM:
      Creating vm with stemcell cid 'ami-0a8dd0a21930bc6ab light':
        CPI 'create_vm' method responded with error: CmdError{"type":"Unknown","message":"Address 10.0.0.6 is in use.","ok_to_retry":false}

Exit code 1
exit status 1```

Offcourse `10.0.0.6` is in use by the current running environment

Register external worker to ControlTower deployment

I have a bosh operation file that adds a public-key for authenticating an external worker. I do not see any way to do this with ControlTower. Is there a different mechanism for external workers that I'm not seeing?

initial deploy fails

Do I need to create a bucket before deploying? documentations isn't very clear.

error

./control-tower deploy poc_concourse --iaas aws

error getting initial config before deploy: [error persisting new config after setting values [NoSuchBucket: The specified bucket does not exist

Add support for using your own key

Add an option for using your own (existing) aws instance key for accessing the web node.
This can be handy for accessing logs/compliance/ ...

`control-tower info` should provide login info for Concourse

After testing Control Tower a little more, I stumbled over the use case where I wanted to script the login to Concourse after control-tower deploy finished.

The log output of the deployment contains the line

DEPLOY SUCCESSFUL. Log in with:
fly --target concourse login --insecure --concourse-url...

However, when I want to get the Concourse login information after I ran the deploy command, I currently don't see a way of getting this from control-tower. control-tower info... does not contain the Concourse credentials.

For automation purposes, it would be extremely helpful to either have the Concourse credentials in Credhub after the deployment (as described in #2) or being able to use control-tower info to get it after the deployment.

clock skew on client gives misleading error

I was running control-tower in a VM which had a clock that was sufficiently wrong that AWS API calls were failing.

Example aws command on this machine:

$ aws sts get-caller-identity

An error occurred (SignatureDoesNotMatch) when calling the GetCallerIdentity operation: Signature expired: 20191
029T154623Z is now earlier than 20191029T163344Z (20191029T164844Z - 15 min.)

I got the following unhelpful error out of control-tower:

$ control-tower deploy --iaas AWS ci.example.com --domain ci.example.com --github-auth-client-secret xxx--github-auth-client-id xxx
error ensuring config bucket exists before deploy: [error determining if bucket [] exists: [InvalidParameter: 1
validation error(s) found.
- minimum field size of 1, HeadBucketInput.Bucket.
]]

After I fixed the clock, everything started working again, but it took me quite a while to work out what was going on because of the strange error. If the control-tower tool had passed through the underlying "Signature expired" error, I would have been able to fix this a lot more quickly.

Dependency to xcode command line developer tools missing

I currently get the following error when trying to run control-tower deploy on GCP:

Installing CPI:
  Compiling job package dependencies for installation:
    Compiling job package dependencies:
      Compiling package:
        Running command: 'bash -x packaging', stdout: '', stderr: '+ set -e
+ set -u
+ PACKAGES_DIR=/Users/USER/.bosh/installations/f7c32b4b-8aee-49e4-704b-7537261a33cf/packages
++ cd /Users/USER/.bosh/installations/f7c32b4b-8aee-49e4-704b-7537261a33cf/packages/golang
++ pwd -P
+ export GOROOT=/Users/USER/.bosh/installations/f7c32b4b-8aee-49e4-704b-7537261a33cf/packages/golang
+ GOROOT=/Users/USER/.bosh/installations/f7c32b4b-8aee-49e4-704b-7537261a33cf/packages/golang
+ export PATH=/Users/USER/.bosh/installations/f7c32b4b-8aee-49e4-704b-7537261a33cf/packages/golang/bin:/usr/local/bin:/usr/bin:/bin
+ PATH=/Users/USER/.bosh/installations/f7c32b4b-8aee-49e4-704b-7537261a33cf/packages/golang/bin:/usr/local/bin:/usr/bin:/bin
+ mkdir -p /Users/USER/.bosh/installations/f7c32b4b-8aee-49e4-704b-7537261a33cf/tmp/bosh-release-pkg811058671/go/src
+ mv /Users/USER/.bosh/installations/f7c32b4b-8aee-49e4-704b-7537261a33cf/tmp/bosh-release-pkg811058671/bosh-google-cpi /Users/USER/.bosh/installations/f7c32b4b-8aee-49e4-704b-7537261a33cf/tmp/bosh-release-pkg811058671/go/src/
+ cd /Users/USER/.bosh/installations/f7c32b4b-8aee-49e4-704b-7537261a33cf/tmp/bosh-release-pkg811058671/go/src/bosh-google-cpi
+ export GOPATH=/Users/USER/.bosh/installations/f7c32b4b-8aee-49e4-704b-7537261a33cf/tmp/bosh-release-pkg811058671/go
+ GOPATH=/Users/USER/.bosh/installations/f7c32b4b-8aee-49e4-704b-7537261a33cf/tmp/bosh-release-pkg811058671/go
+ make build
dyld: Library not loaded: @rpath/DVTFoundation.framework/Versions/A/DVTFoundation
  Referenced from: /Applications/Xcode.app/Contents/Developer/usr/bin/xcodebuild
  Reason: no suitable image found.  Did find:
	/Applications/Xcode.app/Contents/Developer/usr/bin/../../../SharedFrameworks/DVTFoundation.framework/Versions/A/DVTFoundation: cannot load '/Applications/Xcode.app/Contents/Developer/usr/bin/../../../SharedFrameworks/DVTFoundation.framework/Versions/A/DVTFoundation' because Objective-C garbage collection is not supported
xcrun: error: unable to locate xcodebuild, please make sure the path to the Xcode folder is set correctly!
xcrun: error: You can set the path to the Xcode folder using /usr/bin/xcode-select -switch
':
          exit status 69

Exit code 1

Seems like installing the XCode command line tools via

/usr/bin/xcode-select --install
sudo xcode-select --switch /Library/Developer/CommandLineTools

fixes the problem for me. This dependency / prerequisite not stated in README and IMHO should be added.

Thanks a lot for your work!

RDS root certificates must be updated by 3/5/2020

Problem: RDS root certificates will be required to be updated from rds-ca-2015 to rds-ca-2019 by 3/5/2020.

Solution: Update the root certificate used for spinning up RDS via Terraform and update the root certificate located in

control-tower/db/rds_root_cert.go

Line 5 in a615978

const RDSRootCert = `-----BEGIN CERTIFICATE-----

Note: I will be happy to implement this change once we agree on the process for doing so.

https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/UsingWithRDS.SSL-certificate-rotation.html

Telegraf password not redacted on deploy

+ telegraf:
+   influxdb:
+     database: concourse
+     password: fu6aytlsfpkrneowda3b
+     url: http://10.0.0.8:8086
+     username: admin

Spotted this in one of the system test runs.

Scale worker instances according to build queue?

Is it possible to scale the number of workers according to the length of the build queue?

It seems wasteful to be paying for an m4.xlarge instance 24x7 when it is only used sporadically.
Jenkins has this feature already.

Thanks,

Rich

Missing AWS tag on some resources

Some of the resources are not tagged in AWS. What I found:

some, but not all bosh attached volumes are not tagged: /dev/xvda, /dev/sdb
worker attached volumes are not tagged.
web attached volumes are not tagged.
RDS parameter group not tagged
RDS sec. group not tagged
RDS option group not tagged
S3 config not tagged

There might be other resources as well.

deploy in AWS us-west-2 failing, could not find AMI

Here's the error, where does this AMI live?:

Deploying:
Creating instance 'bosh/0':
Creating VM:
Creating vm with stemcell cid 'ami-02e0c74167032a30b light':
CPI 'create_vm' method responded with error: CmdError{"type":"Bosh::Clouds::CloudError","message":"could not find AMI 'ami-02e0c74167032a30b'","ok_to_retry":false}

Exit code 1

IP Whitelisting - "Do you need to add your IP?"

Hi! I'm using a deployment that I created with a command like this:

control-tower deploy --iaas aws \
  --region us-west-2 \
  --domain <domain> \
  --workers 2 \
  --worker-size large \
  --github-auth-client-id <id> \
  --github-auth-client-secret <secret> \
  --add-tag ProvisionedBy=control-tower \
  concourse

Yesterday things were working fine. Today, however, when I try to query info on the deployment, I'm getting this:

control-tower info --region us-west-2  --iaas AWS --env concourse

Do you need to add your IP 162.246.197.181 to the control-tower-concourse-director security group/source range entry for director firewall (for ports 22, 6868, and 25555)?

I can't find anything in the documentation about this issue. How do I solve it?

Thanks!

Slack Link Invalid

This is no longer valid: https://join.slack.com/t/concourse-up/shared_invite/enQtNDMzNjY1MjczNDU3LTA1NzIxYTZkYjFkMjA2ODBmY2E2OTM3OGE3YTc2OGViNTMxYTY4MjYwNGNjOTAxNDNiOGE5NzhmMTQ2NWVhNzQ

Is the Slack community still open to the public?

Changing domain leaves old certificate on credhub

I am working on migrating from concourse-up to control tower and a I want to do the following.

Leave my concourse-up deployment deployed at myconcourse.example.com
Deploy a new concourse using control-tower at myconcourse-v2.example.com
Migrate pipelines across one by one
Leave both running a while just to access logs and history
Delete the concourse-up deployment
Change the domain for the control-tower deployment to myconcourse.example.com

I think that there is a problem with set 6. I have experimented deploying with control-tower to a domain for example: mytest-v2.example.com and then changing the domain in the deploy script to mytest.example.com and running again.

This successfully runs and changes the domains but the certificate used by credhub is the old one. So I get errors like this in my resources:

Finding variable 'ssh-private-key': Get https://mytest.example.com:8844/info: x509: certificate is valid for mytest-v2.example.com, not mytest.example.com

Unable to build images with packer in concourse job

I've got a concourse pipeline that uses packer to build a windows instance. To connect to the windows instance, packer uses WinRM which utilizes TCP port 5986. Unfortunately, the container for the job isn't able to successfully connect to that port and just times out. The windows instance gets created in the default network, and I'm able to talk to it via winrm from outside the control-tower networks. I've tried adjusting the "control-tower-*-[private|public]" firewall rules to allow this port, but that doesn't seem to help. How can I adjust what ports concourse jobs are able to connect to?

Add support for bitbucket login

Add support for authentication with Bitbucket.

Add support for Azure

Add support for Azure IaaS.

Workers don't seem to use spot instances

We have started our concourse deployment using:

(assume-aws-role ops-operations)$ ./control-tower-darwin-amd64 deploy --region eu-west-1 --iaas aws --workers 2 --spot=true --worker-type m5 --worker-size large --web-size medium --add-tag Project=Concourse --domain concoursemaster-production.joske.com--github-auth-client-id id1 --github-auth-client-secret id2 --allow-ips 8.8.8.8 concourse-id

When looking in the AWS console, we don't see spot instances for our worker nodes and we don't see anything related to spot instances on our AWS bill.

Unable to destroy failed install

I'm trying to remove a failed install using control-tower destroy --iaas aws concourse but it throws the following:

Error: aws_security_group.director: "ingress.0.cidr_blocks.0" must contain a valid CIDR, got error parsing: invalid CIDR address: /32
Error: aws_security_group.director: "ingress.1.cidr_blocks.0" must contain a valid CIDR, got error parsing: invalid CIDR address: /32
Error: aws_security_group.director: "ingress.2.cidr_blocks.0" must contain a valid CIDR, got error parsing: invalid CIDR address: /32

How do I remove these resources so I can attempt another install?

Login to the Bosh or concourse web ec2

Hi,

We would like to login to the ec2 servers to install datadog agent. Can someone tell me which is the key to login to which server (bosh or the web node)? or is there a alternative way to send build metrics to datadog. I am aware of access to Grafana dashboard.

Thanks and regards,
Hui

Unable to SSH into my web node

When deploying a concourse on AWS using control tower. Terraform gives back the public key used for accessing the web node. After allowing access on port 22 from my IP to the webnode, I use the following command:

ssh -i concourse-key.pem root@<ip-addresss> or ssh -i concourse-key.pem ubuntu@<ip-addresss>

But I can't access the VM. This is the error i get:

Unauthorized use is strictly prohibited. All access and activity is subject to logging and monitoring. Received disconnect from <ip-addresss> port 22:2: Too many authentication failures Disconnected from <ip-addresss> port 22

control-tower deploy --iaas AWS ci.example.com --domain ci.example.com --github-auth-client-secret xxx --github-auth-client-id xxx

I have configured my teams as follows:

$ cat main-team.yaml
roles:
- name: owner
  local:
    users: ["admin"]
- name: pipeline-operator
  github:
    orgs: ["MyOrgName"]
$ fly -t ci.example.com set-team -n main --config main-team.yaml

... but the concourse login page does not offer a GitHub login option.

Do you know what I have done wrong and how to debug or fix?

I tried logging into the "web" concourse server and I didn't see any github stuff in its env:

$ ps aux | grep conc
bosh_77+ 11383  0.0  0.0  12944  1020 pts/0    S+   16:01   0:00 grep --color=auto conc
vcap     12106  1.1  3.3 497052 67628 ?        S<sl Oct29  35:19 /var/vcap/packages/concourse/bin/concourse web
$ sudo cat /proc/12106/environ | grep -i githu
# (no match)

I was expecting to see the env vars "CONCOURSE_GITHUB_CLIENT_ID" and "CONCOURSE_GITHUB_CLIENT_SECRET" set there via control-tower.

engineerbetter / control-tower Goto Github PK

control-tower's Issues

Summary

Misc. Info:

Steps to Recreate

Recommend Projects

Recommend Topics

Recommend Org