Code Monkey home page Code Monkey logo

terraform-gce-atlantis's People

Contributors

artusiep avatar bschaatsbergen avatar cblkwell avatar d-costa avatar dennislapchenko avatar desmondh0 avatar github-actions[bot] avatar kvanzuijlen avatar mvanholsteijn avatar nitrocode avatar renovate[bot] avatar tpolekhin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

terraform-gce-atlantis's Issues

Test a lower Terraform version (e.g., 0.13.0)

We currently restrain our versions.tf to 1.2.0 or higher.

As there's many users that might still run on 0.13.0 for example, we should see if our code is compatible with this Terraform version.

prevent instance template from being updated when cos image is updated

When a new version of the cos image is available, it will now trigger a update on the instance template (as we always pull the latest version).

We should ignore any changes to this, and possibly allow a user to overwrite or set its own version of the COS image. As update the instance template will trigger Atlantis to be recreated. (this sucks, especially if this change is through Atlantis ๐Ÿ˜„ )

.... the plan I got

  # module.atlantis.google_compute_instance_template.default must be replaced
+/- resource "google_compute_instance_template" "default" {
      ~ id                   = "projects/xxxxxxxx/global/instanceTemplates/atlantis-20230210130232854100000001" -> (known after apply)
      ~ labels               = { # forces replacement
          ~ "container-vm" = "cos-stable-101-17162-127-5" -> "cos-stable-101-17162-127-8"
        }
      ~ metadata_fingerprint = "djl8in4QXsc=" -> (known after apply)
      - min_cpu_platform     = "" -> null
      ~ name                 = "atlantis-20230210130232854100000001" -> (known after apply)
      ~ self_link            = "https://www.googleapis.com/compute/v1/projects/xxxxxxxx/global/instanceTemplates/atlantis-20230210130232854100000001" -> (known after apply)
        tags                 = [
            "atlantis-wl45dn",
        ]
      ~ tags_fingerprint     = "" -> (known after apply)
        # (8 unchanged attributes hidden)

      + confidential_instance_config {
          + enable_confidential_compute = (known after apply)
        }

      ~ disk {
          ~ device_name       = "persistent-disk-0" -> (known after apply)
          ~ interface         = "SCSI" -> (known after apply)
          - labels            = {} -> null
          ~ mode              = "READ_WRITE" -> (known after apply)
          - resource_policies = [] -> null
          ~ source_image      = "projects/cos-cloud/global/images/cos-stable-101-17162-127-5" -> "https://www.googleapis.com/compute/v1/projects/cos-cloud/global/images/cos-stable-101-17162-127-8" # forces replacement
          ~ type              = "PERSISTENT" -> (known after apply)
            # (4 unchanged attributes hidden)
        }
      ~ disk {
          ~ boot              = false -> (known after apply)
          ~ interface         = "SCSI" -> (known after apply)
          - labels            = {} -> null
          - resource_policies = [] -> null
          + source_image      = (known after apply)
          ~ type              = "PERSISTENT" -> (known after apply)
            # (5 unchanged attributes hidden)
        }

      ~ network_interface {
          + ipv6_access_type   = (known after apply)
          ~ name               = "nic0" -> (known after apply)
          ~ network            = "https://www.googleapis.com/compute/v1/projects/xxxxxxxx/global/networks/network" -> (known after apply)
          - queue_count        = 0 -> null
          + stack_type         = (known after apply)
          ~ subnetwork         = "https://www.googleapis.com/compute/v1/projects/xxxxxxxx/regions/europe-west4/subnetworks/subnetwork" -> "projects/xxxxxxxx/regions/europe-west4/subnetworks/subnetwork"
            # (1 unchanged attribute hidden)
        }

      ~ scheduling {
          - min_node_cpus       = 0 -> null
            # (4 unchanged attributes hidden)
        }

        # (2 unchanged blocks hidden)
    }

Allow users to specify the GCP VM machine type

Currently, users of Google Cloud Platform (GCP) Virtual Machines (VMs) are unable to specify the machine type when creating or modifying a VM (we use a default value: n2-standard-2). We propose adding the ability for users to specify the machine type of their GCP VM.

This would give users greater control over the resources allocated to their VM, allowing them to optimize for performance, cost, or other factors.

Open Questions

How will users be able to determine the appropriate machine type for their workload? Will there be any guidance provided, or will they need to determine this on their own?

Block Project-wide SSH keys

Project-wide SSH keys are stored in Compute/Project-meta-data. Project wide SSH keys can be used to login into all instances within a project. Using project-wide SSH keys eases SSH key management. If SSH keys are compromised, the potential security risk can impact all instances within a project.

We currently allow project-wide SSH keys (by surpressing the checkov rule) in the instance template. Preferably this should be made configurable through a variable.

Ensure that environment variables are not shown in the UI

Input environment variables necessary to bootstrap atlantis shouldn't be exposed in the Google Cloud UI as these values contain sensitive information.

Preferably populate an atlantis.env that contains the data passed down to var.env_vars and using envFile in the container spec persist these environment variables into the atlantis container.

Can't run `terraform destroy` when using the module.

It seems that running terraform destory is prevented by the compute instance not deleting it persistent storage causing the error.

โ”‚ Error: Error when reading or editing Project Service <PROJECT-ID>/compute.googleapis.com: Error disabling service 
"compute.googleapis.com" for project "<PROJECT-ID>": Error waiting for api to disable: Error code 9, message: [Error in service 
'compute.googleapis.com': Could not turn off service, as it still has resources in use.

Not sure if this is intentional but I think if the delete_rule is changed to ON_PERMANENT_INSTANCE_DELETION it would allow you to run terraform destroy but not delete the store if you shut down the instance.

Consider using `default` for your resource names instead of reusing `atlantis`

Module usage

module "atlantis" {
  source  = "bschaatsbergen/atlantis/gce"
  version = "0.1.5"
  # insert the 7 required variables 
}

Then a resource like this

https://github.com/bschaatsbergen/terraform-gce-atlantis/blob/88f71057b47fd89d31da5a8c57c4bd8f08a2615d/main.tf#L14

Would have a fully qualified address of module.atlantis.google_compute_instance_template.atlantis which is redundant.

Consider using default for your resource names instead of reusing atlantis which would result in a fully qualified address of module.atlantis.google_compute_instance_template.default

Failed to start container: Volume atlantis-disk-0: Filesystem check failed

Trying to deploy Atlantis on GCE using IAP example.
Container is failing to start because of the filesystem error:

[   70.188299] konlet-startup[1686]: 2023/03/13 14:05:19 Attempting to unmount device /dev/sdb at /mnt/disks/gce-containers-mounts/gce-persistent-disks/atlantis-disk-0.
[   70.190423] konlet-startup[1686]: 2023/03/13 14:05:19 Unmounted /mnt/disks/gce-containers-mounts/gce-persistent-disks/atlantis-disk-0
[   70.190530] konlet-startup[1686]: 2023/03/13 14:05:19 Found 1 volume mounts in container  declaration.
[   70.197085] konlet-startup[1686]: 2023/03/13 14:05:19 Running filesystem checker on device /dev/disk/by-id/google-atlantis-disk-0...
[   70.199060] konlet-startup[1686]: 2023/03/13 14:05:19 Error: Failed to start container: Volume atlantis-disk-0: Filesystem check failed: Failed to execute command [fsck.ext4 -p /dev/disk/by-id/google-atlantis-disk-0]: exit status 8, details: /dev/disk/by-id/google-atlantis-disk-0 is mounted.
[   70.199171] konlet-startup[1686]: e2fsck: Cannot continue, aborting.

Also noticed that chown command fails as well:

...
[   17.150286] systemd-networkd[334]: vethdf3fdbb: Gained carrier
[   17.156971] systemd-networkd[334]: docker0: Gained carrier
[   18.323133] systemd-networkd[334]: docker0: Gained IPv6LL
[   18.962881] systemd-networkd[334]: vethdf3fdbb: Gained IPv6LL
[   39.220362] chown[929]: chown: cannot access '/mnt/disks/gce-containers-mounts/gce-persistent-disks/atlantis-disk-0': No such file or directory
[   55.846631] konlet-startup[627]: 2023/03/13 14:05:04 Received ImagePull response: ({"status":"Pulling from runatlantis/atlantis","id":"latest"}
...

Provision a persistent disk for Atlantis

Atlantis has no external database. Atlantis stores Terraform plan files on disk. If Atlantis loses that data in between a plan and apply cycle, then users will have to re-run plan. Because of this, we want to provision a persistent disk for Atlantis.

Does Atlantis have to run as a privileged container?

Right now, the Atlantis container runs as privileged, but it is unclear why we need to have that set. If we can get it to run as non-privileged, that would be optimal -- otherwise, we should document exactly why it needs to run as privileged so that users understand that need more thoroughly.

Update examples to support optional dns/ssl support for cases where domains are managed at another registrar

https://github.com/bschaatsbergen/terraform-gce-atlantis/blob/main/examples/complete/main.tf#L96-L107 mentions a small section about defining a dns record set (adding an A record) with a managed zone entry.

It would be helpful to mention somewhere that the current implementation is very GCP native or make this part optional.

More context: https://atlantis-community.slack.com/archives/C5MGGAV0C/p1677427493186339

Thanks!

Expose entire container input and/or container image

Its limiting to only support a conditional for the container image

It would be good to expose the entire container input or at least the full container image.

For example, what if i wanted to use a custom image or a custom tag?

https://github.com/bschaatsbergen/terraform-gcp-atlantis/blob/474dbae438ca7005f03a1de3b47f962c694e8cb1/atlantis.tf#L9

You may want to expose more inputs to the upstream module as well

You can also avoid having to put in additional logic for the dev tag #3.

Instance being replaced even when the machine_image is pinned

machine_image pinning was introduced in #112 but even with the value set, the instance is being replaced when a new COS image comes out.

I believe the issue is in the locals:

  labels               = merge(var.labels, { "container-vm" = module.container.vm_container_label })

Didn't have time to dig into it, but I believe if we want to pass the correct label to the module, we need to parse machine_image. Alternatively we should update the way local.labels is generated.

Just raising an issue, in case someone wants to take a look. Might dig into it myself when I have a moment.

option to introduce multiple instances with redis locking

Redis locking would allow an HA setup using multiple atlantis servers.

https://www.runatlantis.io/docs/server-configuration.html#locking-db-type

Related issue terraform-aws-modules/terraform-aws-atlantis#322

CloudMemory store would be the redis equivalent of aws elasticache

This module would not need to implement the redis cluster itself and it may not need adding a count to the GCE instance since the count could be added to the module itself.

An example with a CloudMemory instance with its values fed into this module would be enough to allow people to use it

resource "google_redis_instance" "cache" {
  name           = "memory-cache"
  memory_size_gb = 1
}

module "atlantis" {
  source     = "bschaatsbergen/atlantis/gce"

  count = 4

  name = "atlantis-${count.index}"

  env_vars = {
    ATLANTIS_LOCKING_DB_TYPE = "redis"
    ATLANTIS_REDIS_HOST      = ""
    ATLANTIS_REDIS_PORT      = ""

    # could be randomly generated when the redis cluster is born
    ATLANTIS_REDIS_PASSWORD  = ""

    # ...
  }

  # ...
}

Support Shared VPC deployments

Currently it seems impossible to deploy this module into a project that uses a shared VPC, as the module tries to create a firewall rule and a route (which is not allowed cross project, and also should not happen).

It also seems that the subnetwork_project seems to change after each apply, making the module non-idempotent.

Allow users to bring their own KMS key to encrypt the VM attached disks.

Currently, users of Google Cloud Platform (GCP) Virtual Machines (VMs) are only able to encrypt the attached disks of their VMs using a Google-managed key that is stored in Cloud Key Management Service (KMS).

Benefits

Increased security By allowing users to bring their own KMS key, they have the option to use a key that is stored in their own KMS, rather than relying on a Google-managed key. This can provide an additional layer of security, as the user has full control over the management and rotation of their own key.

Industry best practices: Many organizations have established security policies that require the use of customer-managed keys for data encryption. Allowing users to bring their own KMS key would enable GCP users to comply with these policies and align with industry best practices for data security.

Add an FAQ to each example

I suppose that there's common mistakes made, even though the examples are very detailed - we should provide an FAQ per example with common made mistakes.

feat: don't use static tags to tag the firewall rule, route and instance.

As discussed in #64

Users might apply similar tags to instances, firewall rules and routes to allow/deny traffic.

The tag we currently use: atlantis is very generic and could conflict if there's multiple atlantis solutions running in the project.

Currently I can think of only 2 solutions:

  • Possible to introduce a new variable, that's used to control the behaviour of the firewall rule and public internnet route
  • Add a random string to the atlantis tag: atlantis-${random_string.tag.result} for example.

feat: port the startup script to cloudinit

As users might would like to provide their own startup script instead (see #41), we should move the commands that we execute in the startup script to cloudinit.

This would not break any existing functionality (running a chown on the new GCE persistent disk mount) and allows a user to bring its own startup script.

Consider using null label for naming convention

See the null label

https://github.com/cloudposse/terraform-null-label

This is the mixin thats used across all cloudposse modules

https://github.com/cloudposse/terraform-null-label/blob/master/exports/context.tf

This allows using standard inputs such as namespace, tenant, environment, name, attributes, tags, etc

Example of usage

https://github.com/cloudposse/terraform-aws-ecr

https://github.com/cloudposse/terraform-aws-ecr/blob/master/context.tf

https://github.com/cloudposse/terraform-aws-ecr/blob/0472d649275df45dfd47514275e69792d1567d08/main.tf#L8

So this results in all name arguments set to this value

  name = module.this.id

Add the ability to pull the latest dev image

Currently, users of our platform are required to either pull the latest or prerelease-latest version of the atlantis image. We propose adding the ability to pull the latest dev image.

Allow the use of spot instances

Spot instances can be useful for running Atlantis on GCP in combination with a persistent data disk because they allow you to take advantage of lower prices for compute resources while still being able to store your data on a durable, high-performance disk.

By using a PD-SSD to store your data, you can easily restart Atlantis on a new spot instance if the original instance is terminated, without losing any data. This can help you save money on compute costs while still being able to run Atlantis reliably.

Requires #11 to be completed first.

Web UI Job View of Plans Not Working Behind IAP

I'm seeing something strange where the "live plan" view via the console UI has strange behavior where sometimes the full plan doesn't display (especially for larger plans). I used your terraform module for GCE setup. If I bypass IAP, everything is showing correctly.

I know it was mentioned here that this is due to IAP stripping off the bearer authorization header.

Any ideas for a workaround? Could I put nginx in front of the atlantis docker container to deal with the authorization header issue somehow?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.