Comments (3)
Per Issue #838 and PR #839 -- the leader node will be terminated as the very last step (export INFRAKIT_GROUP_POLICY_SELF_UPDATE=last
, which is also the default behavior -- see https://github.com/docker/infrakit/blob/master/pkg/run/v0/group/group.go#L70), the leader will be terminated as the very last node in the rolling update. Please verify this behavior.
This will address 1. of above. If 1. is guaranteed, the next step is to ensure we can properly terminate the vm and all of its resources in a predictable way -- since the "self" node can shut down at any time due to the vm termination and Terraform apply
could be mid-flight and potentially leaving Terraform files on disk in a corrupted state.
from deploykit.
How we can delete the vm and its associated resources in a way that can be tolerant to terraform apply
being interrupted mid-flight due to the self node being shutdown?
Thinking through how Terraform works... I wonder if this can be done at all... If the self node is terminated as part of terraform apply
, that process will just die mid-flight. Will this leave the terraform state files on disk in a corrupted state? If we know that terraform at least guarantees file / state consistency at the per-resource granularity, then we could do something with creating tombstones of the resources we need to delete:
- Determine a list of resources that needs to be terminated per instance destroy (the vm instance, the volumes).
- Create a folder on disk for the 'delete' operation... for example
delete-<timestamp>
. - In this directory, create symlinks to all the files to be deleted.
- At the top level directory, change a symlink (eg.
delete-current
to point to this new directory). - After the symlink is created, start deleting every files in the
delete-current
directory. - Now call
terraform apply
. Terraform will start deleting resources and update its state file as it proceeds (or maybe wait for everything to be deleted then 'commits'). - The node running the
terraform apply
is terminated. Everything goes out. - At this point, other running manager nodes detects the current leader just went offline. A new round of leader election takes place and a new leader (now already updated node) takes over.
- The new leader starts up.
- The new leader looks at the terraform state files on its disk (which is shared / global mount amongst the managers). It makes sure that all the symlinks in the
delete-current
directory point to no files... If any symlink resolves (os.Readlink()
), it should remove the linked file. - The new leader (its terraform plugin) now calls another
terraform apply
again. - Terraform apply now runs on the new leader node... and reconciles the infra resources with the on-disk files.
The big assumption here is that any files that Terraform writes (its own state files -- not the ones we create/delete) do not get corrupted mid-flight. This is a pretty big assumption. Is there a way you can verify @kaufers ?
If we don't want to make this assumption or don't trust what is said on the tin, then we would have to do something more coordinated. See my comments on #838
from deploykit.
@chungers I think that what you have for #838 and #839 might actually solve this issue. Today, with the "resource" counting, we remove the "globally" scoped resource files when the last VM that is references them is destroyed. In this case, that means that the terraform apply
will include the destroy
call for all of the resources (including the self
VM).
In my testing on IBM Cloud, the resource destroy API call returns pretty quickly and there is a delay (up to a few minutes) before the actual VM is powered down. This provides plenty of time for all of the resources to be destroyed.
We hit issues when the manager group destroy deletes the current leader first. Once the updates are merged to ensure destroy ordering I'll provide an update to this issue (there may no longer be problems).
from deploykit.
Related Issues (20)
- New local CLI returning 0 if the plugin does not exist
- Enrollment controller template indexing rendering "<no value>" HOT 2
- Move repo to infrakit/infrakit
- Group controller always calling instance plugin with "properties=true" HOT 4
- Force manager leadership change on during manager rolling update HOT 3
- Pacing of Rolling updates HOT 1
- Swarm node garbage collection HOT 4
- Error: Property 'Box' must be set
- Request canceled client timeout HOT 2
- Update group rolling update polices
- Confusing/Outdated LinuxKit tutorial HOT 5
- Issues with the Cloudformation Example
- feat: Homebrew HOT 2
- Tutorial Fails with Client.Timeout HOT 2
- Use case documentation needed: provisioning VM using libvirt HOT 9
- Libvirt init section doesn't work: Permission denied
- Please upgrade libvirt HOT 1
- Renaming this project HOT 6
- Is deploykit going to support Apache CloudStack?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from deploykit.