runatlantis / terraform-gce-atlantis Goto Github PK
View Code? Open in Web Editor NEWA set of Terraform configurations for running Atlantis on @googlecloud Compute Engine
License: Apache License 2.0
A set of Terraform configurations for running Atlantis on @googlecloud Compute Engine
License: Apache License 2.0
This is helpful to keep a prerequisite of providers and tf version
terraform {
required_version = ">= 0.13"
providers = {
# ...
}
}
example https://github.com/cloudposse/terraform-aws-ecr/blob/master/versions.tf
Note that you get this input out of the box with #57 using module.this.tags
.
We currently restrain our versions.tf to 1.2.0 or higher.
As there's many users that might still run on 0.13.0 for example, we should see if our code is compatible with this Terraform version.
Currently, users of our platform are required to either pull the latest
version of the atlantis image. We propose adding the ability to pull the latest prerelease image.
Best practices is to use the # comments instead of // comments
This is only one example
When a new version of the cos image is available, it will now trigger a update on the instance template (as we always pull the latest version).
We should ignore any changes to this, and possibly allow a user to overwrite or set its own version of the COS image. As update the instance template will trigger Atlantis to be recreated. (this sucks, especially if this change is through Atlantis ๐ )
.... the plan I got
# module.atlantis.google_compute_instance_template.default must be replaced
+/- resource "google_compute_instance_template" "default" {
~ id = "projects/xxxxxxxx/global/instanceTemplates/atlantis-20230210130232854100000001" -> (known after apply)
~ labels = { # forces replacement
~ "container-vm" = "cos-stable-101-17162-127-5" -> "cos-stable-101-17162-127-8"
}
~ metadata_fingerprint = "djl8in4QXsc=" -> (known after apply)
- min_cpu_platform = "" -> null
~ name = "atlantis-20230210130232854100000001" -> (known after apply)
~ self_link = "https://www.googleapis.com/compute/v1/projects/xxxxxxxx/global/instanceTemplates/atlantis-20230210130232854100000001" -> (known after apply)
tags = [
"atlantis-wl45dn",
]
~ tags_fingerprint = "" -> (known after apply)
# (8 unchanged attributes hidden)
+ confidential_instance_config {
+ enable_confidential_compute = (known after apply)
}
~ disk {
~ device_name = "persistent-disk-0" -> (known after apply)
~ interface = "SCSI" -> (known after apply)
- labels = {} -> null
~ mode = "READ_WRITE" -> (known after apply)
- resource_policies = [] -> null
~ source_image = "projects/cos-cloud/global/images/cos-stable-101-17162-127-5" -> "https://www.googleapis.com/compute/v1/projects/cos-cloud/global/images/cos-stable-101-17162-127-8" # forces replacement
~ type = "PERSISTENT" -> (known after apply)
# (4 unchanged attributes hidden)
}
~ disk {
~ boot = false -> (known after apply)
~ interface = "SCSI" -> (known after apply)
- labels = {} -> null
- resource_policies = [] -> null
+ source_image = (known after apply)
~ type = "PERSISTENT" -> (known after apply)
# (5 unchanged attributes hidden)
}
~ network_interface {
+ ipv6_access_type = (known after apply)
~ name = "nic0" -> (known after apply)
~ network = "https://www.googleapis.com/compute/v1/projects/xxxxxxxx/global/networks/network" -> (known after apply)
- queue_count = 0 -> null
+ stack_type = (known after apply)
~ subnetwork = "https://www.googleapis.com/compute/v1/projects/xxxxxxxx/regions/europe-west4/subnetworks/subnetwork" -> "projects/xxxxxxxx/regions/europe-west4/subnetworks/subnetwork"
# (1 unchanged attribute hidden)
}
~ scheduling {
- min_node_cpus = 0 -> null
# (4 unchanged attributes hidden)
}
# (2 unchanged blocks hidden)
}
Currently, users of Google Cloud Platform (GCP) Virtual Machines (VMs) are unable to specify the machine type when creating or modifying a VM (we use a default value: n2-standard-2
). We propose adding the ability for users to specify the machine type of their GCP VM.
This would give users greater control over the resources allocated to their VM, allowing them to optimize for performance, cost, or other factors.
How will users be able to determine the appropriate machine type for their workload? Will there be any guidance provided, or will they need to determine this on their own?
Project-wide SSH keys are stored in Compute/Project-meta-data. Project wide SSH keys can be used to login into all instances within a project. Using project-wide SSH keys eases SSH key management. If SSH keys are compromised, the potential security risk can impact all instances within a project.
We currently allow project-wide SSH keys (by surpressing the checkov rule) in the instance template. Preferably this should be made configurable through a variable.
Input environment variables necessary to bootstrap atlantis shouldn't be exposed in the Google Cloud UI as these values contain sensitive information.
Preferably populate an atlantis.env
that contains the data passed down to var.env_vars
and using envFile
in the container spec persist these environment variables into the atlantis container.
It seems that running terraform destory
is prevented by the compute instance not deleting it persistent storage causing the error.
โ Error: Error when reading or editing Project Service <PROJECT-ID>/compute.googleapis.com: Error disabling service
"compute.googleapis.com" for project "<PROJECT-ID>": Error waiting for api to disable: Error code 9, message: [Error in service
'compute.googleapis.com': Could not turn off service, as it still has resources in use.
Not sure if this is intentional but I think if the delete_rule
is changed to ON_PERMANENT_INSTANCE_DELETION
it would allow you to run terraform destroy
but not delete the store if you shut down the instance.
Consider spot_machine_enabled
instead of
Consider block_project_ssh_keys_enabled
instead of
Module usage
module "atlantis" {
source = "bschaatsbergen/atlantis/gce"
version = "0.1.5"
# insert the 7 required variables
}
Then a resource like this
Would have a fully qualified address of module.atlantis.google_compute_instance_template.atlantis
which is redundant.
Consider using default
for your resource names instead of reusing atlantis
which would result in a fully qualified address of module.atlantis.google_compute_instance_template.default
Trying to deploy Atlantis on GCE using IAP example.
Container is failing to start because of the filesystem error:
[ 70.188299] konlet-startup[1686]: 2023/03/13 14:05:19 Attempting to unmount device /dev/sdb at /mnt/disks/gce-containers-mounts/gce-persistent-disks/atlantis-disk-0.
[ 70.190423] konlet-startup[1686]: 2023/03/13 14:05:19 Unmounted /mnt/disks/gce-containers-mounts/gce-persistent-disks/atlantis-disk-0
[ 70.190530] konlet-startup[1686]: 2023/03/13 14:05:19 Found 1 volume mounts in container declaration.
[ 70.197085] konlet-startup[1686]: 2023/03/13 14:05:19 Running filesystem checker on device /dev/disk/by-id/google-atlantis-disk-0...
[ 70.199060] konlet-startup[1686]: 2023/03/13 14:05:19 Error: Failed to start container: Volume atlantis-disk-0: Filesystem check failed: Failed to execute command [fsck.ext4 -p /dev/disk/by-id/google-atlantis-disk-0]: exit status 8, details: /dev/disk/by-id/google-atlantis-disk-0 is mounted.
[ 70.199171] konlet-startup[1686]: e2fsck: Cannot continue, aborting.
Also noticed that chown command fails as well:
...
[ 17.150286] systemd-networkd[334]: vethdf3fdbb: Gained carrier
[ 17.156971] systemd-networkd[334]: docker0: Gained carrier
[ 18.323133] systemd-networkd[334]: docker0: Gained IPv6LL
[ 18.962881] systemd-networkd[334]: vethdf3fdbb: Gained IPv6LL
[ 39.220362] chown[929]: chown: cannot access '/mnt/disks/gce-containers-mounts/gce-persistent-disks/atlantis-disk-0': No such file or directory
[ 55.846631] konlet-startup[627]: 2023/03/13 14:05:04 Received ImagePull response: ({"status":"Pulling from runatlantis/atlantis","id":"latest"}
...
Atlantis has no external database. Atlantis stores Terraform plan files on disk. If Atlantis loses that data in between a plan and apply cycle, then users will have to re-run plan. Because of this, we want to provision a persistent disk for Atlantis.
Right now, the Atlantis container runs as privileged, but it is unclear why we need to have that set. If we can get it to run as non-privileged, that would be optimal -- otherwise, we should document exactly why it needs to run as privileged so that users understand that need more thoroughly.
https://github.com/bschaatsbergen/terraform-gce-atlantis/blob/main/examples/complete/main.tf#L96-L107 mentions a small section about defining a dns record set (adding an A record) with a managed zone entry.
It would be helpful to mention somewhere that the current implementation is very GCP native or make this part optional.
More context: https://atlantis-community.slack.com/archives/C5MGGAV0C/p1677427493186339
Thanks!
Slack token, for example, is currently only configured through atlantis configuration file. Any idea how to implement it with current setup?
Smoothest option is to add another cloudinit.write_files
entry, right?
Consider adding a minimal example using tf registry syntax
module "atlantis" {
source = "bschaatsbergen/atlantis/gce"
version = "0.1.5"
# insert the 7 required variables
}
Ref https://registry.terraform.io/modules/bschaatsbergen/atlantis/gce/latest
The provider should only be in the root module, not in the consumable module.
By removing it, you may also be able to remove the project_id input.
https://registry.terraform.io/modules/terraform-google-modules/container-vm/google/latest
Also latest version is 3.1.0 and this module uses 2.x.
Its limiting to only support a conditional for the container image
It would be good to expose the entire container input or at least the full container image.
For example, what if i wanted to use a custom image or a custom tag?
You may want to expose more inputs to the upstream module as well
You can also avoid having to put in additional logic for the dev tag #3.
machine_image
pinning was introduced in #112 but even with the value set, the instance is being replaced when a new COS image comes out.
I believe the issue is in the locals:
labels = merge(var.labels, { "container-vm" = module.container.vm_container_label })
Didn't have time to dig into it, but I believe if we want to pass the correct label to the module, we need to parse machine_image
. Alternatively we should update the way local.labels
is generated.
Just raising an issue, in case someone wants to take a look. Might dig into it myself when I have a moment.
Redis locking would allow an HA setup using multiple atlantis servers.
https://www.runatlantis.io/docs/server-configuration.html#locking-db-type
Related issue terraform-aws-modules/terraform-aws-atlantis#322
CloudMemory store would be the redis equivalent of aws elasticache
This module would not need to implement the redis cluster itself and it may not need adding a count to the GCE instance since the count could be added to the module itself.
An example with a CloudMemory instance with its values fed into this module would be enough to allow people to use it
resource "google_redis_instance" "cache" {
name = "memory-cache"
memory_size_gb = 1
}
module "atlantis" {
source = "bschaatsbergen/atlantis/gce"
count = 4
name = "atlantis-${count.index}"
env_vars = {
ATLANTIS_LOCKING_DB_TYPE = "redis"
ATLANTIS_REDIS_HOST = ""
ATLANTIS_REDIS_PORT = ""
# could be randomly generated when the redis cluster is born
ATLANTIS_REDIS_PASSWORD = ""
# ...
}
# ...
}
Currently it seems impossible to deploy this module into a project that uses a shared VPC, as the module tries to create a firewall rule and a route (which is not allowed cross project, and also should not happen).
It also seems that the subnetwork_project
seems to change after each apply, making the module non-idempotent.
Currently, users of Google Cloud Platform (GCP) Virtual Machines (VMs) are only able to encrypt the attached disks of their VMs using a Google-managed key that is stored in Cloud Key Management Service (KMS).
Increased security By allowing users to bring their own KMS key, they have the option to use a key that is stored in their own KMS, rather than relying on a Google-managed key. This can provide an additional layer of security, as the user has full control over the management and rotation of their own key.
Industry best practices: Many organizations have established security policies that require the use of customer-managed keys for data encryption. Allowing users to bring their own KMS key would enable GCP users to comply with these policies and align with industry best practices for data security.
I suppose that there's common mistakes made, even though the examples are very detailed - we should provide an FAQ per example with common made mistakes.
As discussed in #64
Users might apply similar tags to instances, firewall rules and routes to allow/deny traffic.
The tag we currently use: atlantis
is very generic and could conflict if there's multiple atlantis solutions running in the project.
Currently I can think of only 2 solutions:
atlantis
tag: atlantis-${random_string.tag.result}
for example.See tflint, tfsec, checkov
These can be run in pre commit hooks and can be run as a pr check
As users might would like to provide their own startup script instead (see #41), we should move the commands that we execute in the startup script to cloudinit.
This would not break any existing functionality (running a chown on the new GCE persistent disk mount) and allows a user to bring its own startup script.
https://registry.terraform.io/modules/terraform-google-modules/container-vm/google/latest?tab=inputs
Also you might as well move all the atlantis.tf stuff to main.tf to keep it together since its not that different to warrant a different file name, no?
https://github.com/bschaatsbergen/terraform-gce-atlantis/blob/main/startup-script.sh
Use set -e
Use /usr/bin/env bash instead of /binbash
Use shellcheck to see if it catches anything
Run shellcheck in github action whenever shell script is modified
Cloudinit performs a chown with the uid 100 (atlantis user) on the GCE Persistent Disk mount path - we should document this so that users are aware that when handrolling their own Docker image.
See the null label
https://github.com/cloudposse/terraform-null-label
This is the mixin thats used across all cloudposse modules
https://github.com/cloudposse/terraform-null-label/blob/master/exports/context.tf
This allows using standard inputs such as namespace, tenant, environment, name, attributes, tags, etc
Example of usage
https://github.com/cloudposse/terraform-aws-ecr
https://github.com/cloudposse/terraform-aws-ecr/blob/master/context.tf
So this results in all name arguments set to this value
name = module.this.id
Consider outputting the entire module.atlantis
output "atlantis" {
value = module.atlantis
description = "All of the outputs of the upstream terraform-google-modules/container-vm/google module"
}
It would be useful to make the domain
input optional to and allow for the module to either reserve a static external IP or take as an input.
It would make getting up and running a bit faster for testing purposes.
Currently, users of our platform are required to either pull the latest
or prerelease-latest
version of the atlantis image. We propose adding the ability to pull the latest dev image.
The interface
env_vars = [
{
name = "var"
value = "value"
}
]
Is more intuitive like this
env_vars = {
var = "value"
}
Spot instances can be useful for running Atlantis on GCP in combination with a persistent data disk because they allow you to take advantage of lower prices for compute resources while still being able to store your data on a durable, high-performance disk.
By using a PD-SSD to store your data, you can easily restart Atlantis on a new spot instance if the original instance is terminated, without losing any data. This can help you save money on compute costs while still being able to run Atlantis reliably.
Requires #11 to be completed first.
There are multiple examples here but here is just one
Which should be
tty : true
This helps with consistency and readability since hcl is formatted with terraform fmt
whereas json is not
For testing this module.
Example https://github.com/cloudposse/terraform-aws-ecr/tree/master/test/src
Consider overriding the startup script path
And consider overriding the entire metadata_startup_script
input to avoid the templatefile
(if there are benefits to it, idk)
I'm seeing something strange where the "live plan" view via the console UI has strange behavior where sometimes the full plan doesn't display (especially for larger plans). I used your terraform module for GCE setup. If I bypass IAP, everything is showing correctly.
I know it was mentioned here that this is due to IAP stripping off the bearer authorization header.
Any ideas for a workaround? Could I put nginx in front of the atlantis docker container to deal with the authorization header issue somehow?
See https://github.com/cloudposse/terraform-aws-ecr/tree/master/examples/complete for inspiration
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.