Code Monkey home page Code Monkey logo

terraform-provider-multispace's Introduction

Terraform Multispace Provider

The multispace Terraform provider implements resources to help work with multi-workspace workflows in Terraform Cloud (or Enterprise). The goal of the provider is to make it easy to perform cascading creation/deletes in the proper order across a series of dependent Terraform workspaces.

For more details on motivation, see the "why?" section.

Warning: Despite my affiliation with HashiCorp, this is NOT an official HashiCorp project and is not supported by HashiCorp. This was created on my personal time for personal use cases.

Features

  • Cascading create/destroy of multiple Terraform workspaces in dependency order.

  • Automatic retry of failed plans or applies within a workspace.

  • Optionally wait for a human to manually confirm a plan for one or more workspaces before continuing.

Installation

See the installation instructions on the Terraform Registry. Generally, add the mitchellh/multispace provider to your required_providers block and run terraform init:

terraform {
  required_providers {
    multispace = {
      source = "mitchellh/multispace"
      version = "<VERSION HERE>"
    }
  }
}

Usage

The example below cascades applies and destroys across multiple workspaces.

The recommended usage includes pairing this with the tfe provider. The tfe provider is used to configure your workspaces, and the multispace provider is used to create a tree of workspaces that are initialized together.

Note on usage: I usually only use this to manage the create/destroy lifecycle today. The steady-state modification workflow uses the standard Terraform Cloud VCS-driven workflows. This provider just helps me stand up my initial environments and subsequently tear them down.

resource "multispace_run" "root" {
  # Use string workspace names here and not data sources so that
  # you can define the multispace runs before the workspace even exists.
  workspace    = "tfc"
  organization = "my-org"
}

resource "multispace_run" "physical" {
  organization = "my-org"
  workspace    = "k8s-physical"
  depends_on   = [multispace_run.root]

  retry = false
}

resource "multispace_run" "core" {
  organization = "my-org"
  workspace    = "k8s-core"
  depends_on   = [multispace_run.physical]
}

resource "multispace_run" "dns" {
  organization = "my-org"
  workspace    = "dns"
  depends_on   = [multispace_run.root]
  manual_confirm = true
}

resource "multispace_run" "ingress" {
  organization = "my-org"
  workspace    = "ingress"
  depends_on   = [multispace_run.core, multispace_run.dns]
}

Why?

Multiple workspaces are my recommended approach to working with Terraform. Small, focused workspaces make Terraform runs fast, limit the blast radius, and enable easier work separation by teams. The terraform_remote_state data source can be used to pass outputs from one workspace to another workspace. This enables a clean separation of responsibilities. This is also officially recommended by Terraform.

I also use multiple workspaces as a way to model environments: dev, staging, production, etc. An environment to me is a collection of many workspaces working together to create a working environment. For example, one project of mine has the following workspaces that depend on each other to create a full environment: k8s-physical, k8s-core, dns, metrics, etc.

The problem statement is that I do not have a good way to create my workspaces, create them all at once in the right order, and then destroy them if I'm done with the environment. Without this provider, I have to manually click through the Terraform Cloud UI.

With this provider, I can now create a single Terraform module that is used to launch a complete environment for a project, composed of multiple workspaces. And I can destroy that entire environment with a terraform destroy, which cascades a destroy through all the workspaces in the correct order thanks to Terraform.

Note that Terraform Cloud does provide run triggers but this doesn't quite solve my problem: I don't generally want run triggers, I just want to mainly do what I'd describe as a "cascading apply/destroy" for creation/destruction. For steady-state modifications once an environment exists, I use the typical Terraform Cloud VCS-driven workflow (which may or may not involve run triggers at that point).

Future Functionality

The list below has functionality I'd like to add in the future:

  • Only create if there is state, otherwise, assume initialization is done. This will allow this provider to be adopted into existing workspace trees more easily.

Developing the Provider

If you wish to work on the provider, you'll first need Go installed on your machine (see Requirements above).

To compile the provider, run go install. This will build the provider and put the provider binary in the $GOPATH/bin directory.

To generate or update documentation, run go generate.

In order to run the full suite of Acceptance tests, run make testacc.

Note: Acceptance tests create real resources, and often cost money to run.

$ make testacc

terraform-provider-multispace's People

Contributors

dependabot[bot] avatar lucymhdavies avatar mitchellh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

terraform-provider-multispace's Issues

`context deadline exceeded` while triggered run is still queued

When a run is enqueued for a long time due to the available workers being tied up, the multispace run errors with context deadline exceeded. I've noticed this specifically in destroy runs. A custom timeout has been set, but it doesn't seem to have effect for destroy runs (the same issue on create happens, but after the configured timeout as expected)

Terraform Version

Terraform 1.0.8, 1.0.9
multispace 0.1.0

Affected Resource(s)

Please list the resources as a list, for example:

  • multispace_run

If this issue appears to affect multiple resources, it may be an issue with Terraform's core, so please mention this.

Terraform Configuration Files

resource "tfe_workspace" "app" {
  for_each          = local.apps
  name              = "app-${each.key}-${var.aws_region}-${var.environment}"
  description       = "Terraform configuration for app-${each.key}"
  organization      = var.tfe_organization_name
  auto_apply        = true
  queue_all_runs    = false
  terraform_version = var.terraform_version
  working_directory = "environments/${var.aws_region}/${var.environment}/apps/${each.key}"
  trigger_prefixes  = ["modules", "shared/app"]
  tag_names         = ["app", var.environment]
}

resource "tfe_variable" "environment" {
  for_each     = tfe_workspace.app
  key          = "environment"
  category     = "terraform"
  value        = var.environment
  workspace_id = each.value.id
}

resource "multispace_run" "run" {
  for_each     = tfe_workspace.app
  organization = var.tfe_organization_name
  workspace    = each.value.name

  timeouts {
    create = "1h"
    delete = "1h"
  }

  depends_on = [
    # wait for all vars to be set before triggering run
    tfe_variable.environment,
  ]
}

Debug Output

Please provider a link to a GitHub Gist containing the complete debug output: https://gist.github.com/pedroslopez/fffcbb4f1786246ddea8d84dacfebac5

Gist from a different workspace where I was able to reproduce the issue.

Expected Behavior

What should have happened?

On destroy, the mutlispace_run should have waited up to the configured destroy timeout while the related run was still queued, or ideally it should keep waiting as long as the run is still queued.

Actual Behavior

What actually happened?

After 15 minutes, the run failed with context deadline exceeded. The run triggered by multispace_run eventually ran once the workers became available, but by then the deadline error had already happened.

Steps to Reproduce

This can easily be reproduced in a free terraform cloud organization where there are not enough workers to process the triggered run. Just have the multispace_run trigger a destroy run and see that it only waits up to 15 minutes, failing with context deadline exceeded.

Important Factoids

Are there anything atypical about your accounts that we should know? For example: Running in EC2 Classic? Custom version of OpenStack? Tight ACLs?

Pretty standard Terraform Cloud for Business organization, but we only have 3 workers so if multiple workspaces are being destroyed that take a long time to clean up the resources we run into this issue.

retry on "configuration version still processing" error prior to run create

Terraform Version

$ terraform -v
Terraform v1.0.11
on darwin_amd64
+ provider registry.terraform.io/hashicorp/tfe v0.26.1
+ provider registry.terraform.io/mitchellh/multispace v0.1.0

Affected Resource(s)

Please list the resources as a list, for example:

  • multispace_run

Terraform Configuration Files

resource "tfe_workspace" "child" {
  name = "multispace-child"

  description  = "Child Workspace"
  organization = "hashi_strawb_testing"
  auto_apply   = true

  vcs_repo {
    identifier     = "lucymhdavies/terraform-provider-multispace"
    oauth_token_id = "ot-TZxcR8MRaTRvteFL"
    branch         = "main"
  }
  working_directory = "test/noop"

  queue_all_runs = false
}

resource "multispace_run" "child" {
  workspace    = tfe_workspace.child.name
  organization = "hashi_strawb_testing"
}

Debug Output

Please provider a link to a GitHub Gist containing the complete debug output: https://www.terraform.io/docs/internals/debugging.html. Please do NOT paste the debug output in the issue; just paste a link to the Gist.

https://gist.github.com/lucymhdavies/6c9edec17329989c388d8cf969d0dd32

Expected Behavior

What should have happened?

  • Workspace created
  • Multispace Run triggered

Actual Behavior

What actually happened?

  • Workspace created
  • Multispace Run failed

Steps to Reproduce

A Terraform Apply with the provided code should be sufficient to reproduce the issue.

The issue is intermittent, so may need a few tries before it can be replicated.

Important Factoids

Nothing special, just a standard TFC4B Org

Error: Run entered unexpected state "policy_checked", expected applied

Terraform Version

$ terraform -v
Terraform v1.0.10
on darwin_amd64
+ provider registry.terraform.io/hashicorp/null v3.1.0
+ provider registry.terraform.io/hashicorp/random v3.1.0
+ provider registry.terraform.io/hashicorp/tfe v0.26.1
+ provider registry.terraform.io/hashicorp/time v0.7.2
+ provider registry.terraform.io/mitchellh/multispace v0.1.0

Your version of Terraform is out of date! The latest version
is 1.0.11. You can update by downloading from https://www.terraform.io/downloads.html

Affected Resource(s)

multispace_run

Expected Behavior

What should have happened?

terraform apply should have triggered a run on a workspace, and succeeded once that workspace had completed

Actual Behavior

What actually happened?

╷
│ Error: Run "run-Bd9KyxHaiT5bQgVw" entered unexpected state "policy_checked", expected applied
│
│   with multispace_run.webserver["dev"],
│   on main.tf line 96, in resource "multispace_run" "webserver":
│   96: resource "multispace_run" "webserver" {
│

Steps to Reproduce

  • Have some sentinel policies enabled on a workspace
  • Workspace should also require manual approval before apply
  • Trigger the workspace with multispace_run

Important Factoids

Are there anything atypical about your accounts that we should know? For example: Running in EC2 Classic? Custom version of OpenStack? Tight ACLs?

References

To my layperson's eye, it appears the issue is in (at least) one of these places:

The provider does not account for the possibility that a run may be in a RunPolicy* or RunCost* state

Seems like a relatively simple fix, so I may see if I can PR it myself :)

Unaccounted Run States in `resource_run` -> `waitForRun()`

Depending on the specific TFE/TFC organization and/or workspace settings, it appears that resource_run may attempt invalid run transitions. That is, it attempts to apply a run when an apply is not an available action. For example, when the workspace targeted by resource_run has with cost estimation enabled and the run's plan is finished:

-----------------------------------------------------: timestamp=2021-10-19T13:05:38.006-0500
2021-10-19T13:05:38.007-0500 [INFO]  provider.terraform-provider-multispace_v0.1.0: 2021/10/19 13:05:38 [DEBUG] non-progressive state, exiting "cost_estimating": timestamp=2021-10-19T13:05:38.007-0500
2021-10-19T13:05:38.007-0500 [INFO]  provider.terraform-provider-multispace_v0.1.0: 2021/10/19 13:05:38 [INFO] plan complete, confirming apply. "<some-run-id>": timestamp=2021-10-19T13:05:38.007-0500
2021-10-19T13:05:38.008-0500 [INFO]  provider.terraform-provider-multispace_v0.1.0: 2021/10/19 13:05:38 [DEBUG] TFE API Request Details:
---[ REQUEST ]---------------------------------------
POST /api/v2/runs/<some-run-id>/actions/apply HTTP/1.1
Host: app.terraform.io

< ... >

{
 "data": {
  "type": "",
  "attributes": {
   "comment": "terraform-provider-multispace on Tue Oct 19 13:05:38 CDT 2021"
  }
 }
}

-----------------------------------------------------: timestamp=2021-10-19T13:05:38.008-0500
multispace_run.certificates: Still creating... [4m40s elapsed]
2021-10-19T13:05:38.059-0500 [TRACE] dag/walk: vertex "multispace_run.infrastructure" is waiting for "multispace_run.certificates"
2021-10-19T13:05:38.080-0500 [INFO]  provider.terraform-provider-multispace_v0.1.0: 2021/10/19 13:05:38 [DEBUG] TFE API Response Details:
---[ RESPONSE ]--------------------------------------
HTTP/2.0 409 Conflict

<...>

{
 "errors": [
  {
   "status": "409",
   "title": "transition not allowed"
  }
 ]
}
-----------------------------------------------------: timestamp=2021-10-19T13:05:38.080-0500
2021-10-19T13:05:38.083-0500 [TRACE] maybeTainted: multispace_run.certificates encountered an error during creation, so it is now marked as tainted

<...>

│ Error: transition not allowed

│   with multispace_run.certificates,
│   on integration_tests.tf line 33, in resource "multispace_run" "certificates":
│   33: resource "multispace_run" "certificates" {


2021-10-19T13:05:38.129-0500 [TRACE] statemgr.Filesystem: removing lock metadata file .terraform.tfstate.lock.info
2021-10-19T13:05:38.131-0500 [TRACE] statemgr.Filesystem: unlocking terraform.tfstate using fcntl flock
2021-10-19T13:05:38.132-0500 [DEBUG] provider.stdio: received EOF, stopping recv loop: err="rpc error: code = Unavailable desc = transport is closing"

and the tl;dr version of those debug logs:

non-progressive state, exiting "cost_estimating": timestamp=2021-10-19T13:05:38.007-0500
plan complete, confirming apply. "<some-run-id>": timestamp=2021-10-19T13:05:38.007-0500
Error: transition not allowed

I'm unclear on what the TFC API maintainers would consider the canonical method for determing when a run is ready to receive an apply action request. Perhaps the actions attribute on the workspace run would be a better fit? This is-confirmable key in particular:

{
 "data": {
  "id": "<some-run-id>",
  "type": "runs",
  "attributes": {
   "actions": {
    "is-cancelable": true,
    "is-confirmable": false,
    "is-discardable": false,
    "is-force-cancelable": false
   },
   # ...
}

Anywho, I would be happy to try submitting a patch along these lines at some point but figured I would lodge an issue on the matter in the meantime. 😄

Provider should now be redundant

Thanks to hashicorp/terraform-provider-tfe#742
(and the bugfixes which followed its initial release)

As of v0.46.0 of the TFE provider I'm personally satisfied that tfe_workspace_run does everything this provider does.

Thanks for creating the provider in the first place, but I think it's no longer required :)

I've got a blog post coming out early next week talking more about tfe_workspace_run, but do you want to add something to the Readme in this repo pointing folks to use the tfe_workspace_run?

Add optional "wait_for_apply" and "wait_for_destroy" paremeters on multispace_run

The provider is great for me running TFC demos. I love the ability to create a bunch of workspaces, and then kick them off.

For my use-case, it's more useful for me to kick off the Plan/Apply, without waiting for it to complete. i.e. I want my bootstrap workspace to create the child workspace, kick off a plan, and leave it at that.

For my use-case, it's also useful that when kicking off a Destroy run, it should wait for it to complete. i.e. I do not want my bootstrap workspace to destroy a child workspace until that workspace has successfully finished a Destroy.

As such (and I hope to provide a PR for this), I propose adding wait_for_apply and wait_for_destroy parameters to the resource, both of which would default to true.

These two parameters would be used to conditionally run this block from doRun:
https://github.com/mitchellh/terraform-provider-multispace/blob/main/internal/provider/resource_run.go#L200-L316

It may also be useful to avoid setting AutoApply to false for these cases (and use workspace default).
https://github.com/mitchellh/terraform-provider-multispace/blob/main/internal/provider/resource_run.go#L173-L174
but that's something I may add as part of a different PR

Additional workspace states (e.g. post_plan_running) the provider does not expect

Moved out of #68

Found a few more instances of unhandled workspace states, so either a similar fix to #8 is needed, or perhaps something more forwards-compatible.

(i.e. list all the states we do expect, and do nothing for the ones we do not expect)

Debug Output

-----------------------------------------------------: timestamp=2022-10-07T14:01:17.596+0100
2022-10-07T14:01:17.597+0100 [INFO]  provider.terraform-provider-multispace_v0.2.0: 2022/10/07 14:01:17 [DEBUG] non-progressive state, exiting "post_plan_running": timestamp=2022-10-07T14:01:17.597+0100
2022-10-07T14:01:17.597+0100 [INFO]  provider.terraform-provider-multispace_v0.2.0: 2022/10/07 14:01:17 [INFO] plan complete, confirming apply. "run-sQQsTk8Nt8qQXxXB": timestamp=2022-10-07T14:01:17.597+0100
2022-10-07T14:01:17.597+0100 [INFO]  provider.terraform-provider-multispace_v0.2.0: 2022/10/07 14:01:17 [DEBUG] TFE API Request Details:
---[ REQUEST ]---------------------------------------
POST /api/v2/runs/run-sQQsTk8Nt8qQXxXB/actions/apply HTTP/1.1
Host: app.terraform.io
User-Agent: go-tfe
Content-Length: 109
Accept: application/vnd.api+json
Authorization: Bearer S6JcndyjRR8ZJQ.atlasv1.REVOKED_TOKEN
Content-Type: application/vnd.api+json
Accept-Encoding: gzip

{
 "data": {
  "type": "",
  "attributes": {
   "comment": "terraform-provider-multispace on Fri Oct 7 14:01:17 BST 2022"
  }
 }
}

-----------------------------------------------------: timestamp=2022-10-07T14:01:17.597+0100
2022-10-07T14:01:17.720+0100 [TRACE] NodeAbstractResouceInstance.writeResourceInstanceState to workingState for multispace_run.trigger_workspaces["webserver-aws-dev"]
2022-10-07T14:01:17.720+0100 [TRACE] NodeAbstractResouceInstance.writeResourceInstanceState: writing state object for multispace_run.trigger_workspaces["webserver-aws-dev"]
2022-10-07T14:01:17.721+0100 [INFO]  provider.terraform-provider-multispace_v0.2.0: 2022/10/07 14:01:17 [DEBUG] TFE API Response Details:
---[ RESPONSE ]--------------------------------------
HTTP/2.0 409 Conflict
Content-Length: 62
Cache-Control: no-cache
Content-Type: application/vnd.api+json; charset=utf-8
Date: Fri, 07 Oct 2022 13:01:17 GMT
Referrer-Policy: strict-origin-when-cross-origin
Strict-Transport-Security: max-age=63072000; includeSubDomains; preload
Tfp-Api-Version: 2.5
Vary: Accept-Encoding
Vary: Accept, Origin
X-Content-Type-Options: nosniff
X-Download-Options: noopen
X-Frame-Options: SAMEORIGIN
X-Permitted-Cross-Domain-Policies: none
X-Ratelimit-Limit: 30
X-Ratelimit-Remaining: 27
X-Ratelimit-Reset: 0.043
X-Request-Id: 8419f58a-d2d5-46c0-2ba0-f9e3cffe4a30
X-Xss-Protection: 1; mode=block

{
 "errors": [
  {
   "status": "409",
   "title": "transition not allowed"
  }
 ]
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.