89luca89 / terrible Goto Github PK

View Code? Open in Web Editor NEW

245.0 15.0 25.0 1.15 MB

An Ansible playbook that applies the principle of the Infrastructure as Code on a QEMU/KVM environment.

License: GNU General Public License v3.0

Dockerfile 2.38% YAML 91.77% Markdown 0.81% Jinja 5.05%

iac kvm qemu libvirt qemu-kvm ansible terraform terraform-libvirt automation infrastructure-as-code

terrible's Introduction

Terrible: IaC for QEMU/KVM

Workflows Status

This Ansible playbook allows you to initialize and then deploy an entire infrastructure through the aid of Terraform, on a QEMU/KVM environment.

Terraform + Ansible

Abstract
- Why not just use a single tool?
- How it works
Requirements
Configuration
Compatibility
Installation
- Container image
Usage
- Outputs
Authors
License

Abstract

The Infrastructure as Code had considerable growth in the cloud lately. However, a separate discussion has to be done regarding the private cloud. For various reasons, companies may need to use an internal infrastructure instead of cloud ones but, unfortunately, there aren't as many solutions fsuitable or the private cloud that most of the companies needs. Our main idea was to implement a flexible and powerful solution to build an infrastructure from scratch effortlessly. Using the Ansible flexibility and the Terraform power allows users to abstract both the creation and the provisioning of the entire infrastructure, describing the whole process in a single file, easy to read and to maintain on the long run.

Why not just use a single tool?

By our side, the need to create a single source of truth for both the infrastructure creation and provisioning, lead to that choice. As said, our objective was achievable thanks to the Ansible flexibility combined to Jinja2 simplicity that makes us capable of dynamically generating HCL files in order to leverage Terraform power and compatibility to build the infrastructure in no time.

How It Works

The basic idea comes from the complexity to automate VMs deployments in a QEMU/KVM environment.

For this reason we decided to automate the deployment process as much as possible, using Ansible and Jinja2.

First of all we provided a basic HCL file (templated with Jinja2) describing a basic VM implementation. This is what is usually called IaC (Infrastructure as Code).

Then by using Terraform and its amazing libvirt provider (https://github.com/dmacvicar/terraform-provider-libvirt), we can finally deploy the resultants HCL files generated by Ansible.

The figure below describes the process in an easier way.

As you can see, we start with the templated file (terraform-vm.tf.j2).

When Ansible runs, it generates n .tf files, depending on the VM specified inside the inventory. This is the result of the initialization phase. Once this task finished, the files are completed and ready to be used by Terraform.

At this time, Ansible takes those files and uses Terraform for each instance of them. Once the task is finished, the VMs previously described into the inventory, will be correctly deployed into the QEMU/KVM server(s).

Requirements

Dependency	Minimum Version	Reference
Ansible	2.9+	https://docs.ansible.com/
Terraform	0.12+	https://www.terraform.io/docs/index.html
Terraform Provider Libvirt	0.6+	https://github.com/dmacvicar/terraform-provider-libvirt/
Libvirt (on the target)	1.2.14+	https://libvirt.org/docs.html

Configuration

First of all you have to compose the inventory file in a right way. That means you have to describe the VMs you want to deploy into the server.

As you will see, there are some interesting variables you are allowed to use to properly describe your infrastructure. Some of them are required and some others are optional.

Below you can see the basic structure for the inventory.

all:
    vars:
        ...
    hosts:
        terraform_node:
            ...
        hypervisor_1:
            ...
        hypervisor_2:
            ...
    children:
        deploy:
            vars:
                pool_name: ...
                ...
            children:
                group_1:
                    hosts:
                        host_1:
                            ...
                    vars:
                        ...
                group_2:
                    hosts:
                        host_2:
                            ...
                        host_3:
                            ...
                group_3:
                    hosts:
                        host_4:
                            ...
                    vars:
                        ...
                group_4:
                    hosts:
                        host_5:
                            ...

Under the 1st vars tag, you can specify the various hypervisors you want to use to distribute your infrastructure.

Here's a little example:

all:
    hosts:
        terraform_node:
            ansible_host: 127.0.0.1
            ansible_connection: local
    vars:
        ...
    children:
        deploy:
            vars:
                pool_name: default
        ...

In the above example, we specified the uri of the QEMU/KVM server (which is going to be common among all the VMs in this specific hypervisor group), the storage pool name for the QEMU/KVM server and the terraform node address, which describes where Terraform is installed and where is going to being run.

Now, for each VM we want to specify some property such as the number of the cpu(s), ram, network interfaces, etc.

Here's a little example:

        ...
all:
    hosts:
        terraform_node:
            ansible_host: 127.0.0.1
            ansible_connection: local
        hypervisor_1:
            ansible_host: 127.0.0.1
            ansible_connection: local
    children:
        deploy:
            vars:
                pool_name: default
                disk_source: "~/VirtualMachines/centos8-terraform.qcow2"
            children:
                group_1:
                    hosts:
                        host_1:
                            os_family: RedHat
                            cpu: 4
                            memory: 8192
                            hypervisor: hypervisor_1
                            network_interfaces:
                                ...
                group_2:
                    hosts:
                        host_2:
                            os_family: RedHat
                            cpu: 2
                            hypervisor: hypervisor_1
                        host_3:
                            os_family: Suse
                            disk_source: "~/VirtualMachines/opensuse15.2-terraform.qcow2"
                            cpu: 4
                            memory: 4096
                            set_new_passowrd: password123
                            hypervisor: hypervisor_1
                            network_interfaces:
                                ...

In this example, we specified 2 main groups (group_1, group_2) and linked the VMs to hypervisor_1. Those groups are made of 3 VMs (host_1, host_2, host_3). As you can see, not all the properties has been specified for each machine. This is posible due to the default variables value provided by this playbook.

Thanks to the variables hierarchy in Ansible, you can configure variables:

Hypervisor wise
VM group wise
Single VM wise

This will make easier to manage large homogeneous clusters, still retaining the power of per-VM customization.

In the above example, we can see, for hypervisor_1, the default OS for the VMs is Centos, but we specified a different one for host_3 a single OpenSuse node. Similarly for the hypervisor_2, the default OS for the VMs is Ubuntu, but we specified Centos for the host_4 node.

This kind of configuration granularity is valid for any variable in the playbook.

You can check the default values under default/main.yml.

Variables

Once understood how to fill the inventory file, you are ready to check all the available variables to generate your infrastructure.

These variables are required:

ansible_host: required. Specifies the ip address for the VM. If not specified, a random ip is assigned.
ansible_jump_hosts: required if terraform_bastion_enabled is True. Specifies one or more jumphost/bastions for the ansible provisioning part.
cloud_init: optional. Specifies if the VM uses a cloud-init image or not. False If not specified.
data_disks: optional. Specifies additional disks to be added to the VM. Check disks section for internal required varibles: HERE
disk_source: required. Specifies the (local) path to the virtual disk you want to use to deploy the VMs.
hypervisor: required. Specifies on which hypervisor to deploy the Infrastructure.
network_interfaces: required. Specifies VM's network interfaces. Check network section for internal required variables: HERE
os_family: required. Specifies the OS family for the installation. Possible values are: RedHat, Debian, Suse, Alpine, FreeBSD.
pool_name: required. Specifies the storage pool name where you want to deploy the VMs on the QEMU/KVM server.
ssh_password: required. Specifies the password to access the deployed VMs.
ssh_port: required. Specifies the port to access the deployed VMs.
ssh_public_key_file: required. Specifies the ssh public key file to deploy on the VMs.
ssh_user: required. Specifies the user to access the deployed VMs.

Ansible hosts required outside the deploy group:

terraform_node: required. Specifies the the machine that performs the Terraform tasks.
hypvervisor_[0-9+]: required. Specifies the machine/machines that works as QEMU/KVM hypervisor (at least one machine needed). The default value of 127.0.0.1 indicates that the machine that perform the Terraform tasks is the same that runs the Ansible playbook. In case the Terraform machine is not the localhost, you can specify the ip/hostname of the Terraform node. More details could be found here: HERE

These variables are optional, there are sensible defaults set up, most of them can be declared from hypervisor scope to vm-group scope and per-vm scope:

change_passwd_command: optional. Specifies a different command to be used to change the user's password. If not specified the default command is used. Default: echo root:{{ set_new_password }} | chpasswd. This variable become really useful when you are using a FreeBSD OS.
cpu: optional. Specifies the cpu number for the VM. If not specified, the default value is taken. Default: 1
memory: optional. Specifies the memory ram for the VM. If not specified, the default value is taken. Default: 1024
set_new_password: optional. Specifies a new password to access the Vm. If not specified, the default value (ssh_password) is taken.
terraform_custom_provisioners: optional. Specifies custom shell commands to run on newly created instances BEFORE ansible starts setting them up. Default ""
terrible_custom_provisioners: optional. Specifies custom shell commands to run AFTER terrible run is completed (for example calling specific ansible pull for each node). Default ""
vm_autoboot: optional. Specifies if the VM should be automatically started at boot. Default: False
base_deploy_path: optional. Specifies where the Terraform files and state will be deployed, the default value is $HOME
state_save_file: optional. Specifies where the output terrible state is stored, the default value is PATH_TO_THE_INVENTORY-state.tar.gz

Terraform Node, Bastions & Jumphosts

The following section will describe some different deployment scenarios that we may tipically encounter.

As described above, the terraform_node variable is required. The terraform_node could be local or remote.

A really common scenario will have a local Terraform node. This could be declared as follows:

all:
    vars:
        ...
    hosts:
        terraform_node:
            ansible_host: 127.0.0.1
            ansible_connection: local
        hypervisor_1:
            ansible_host: 127.0.0.1
            ansible_connection: local

This scenario assumes that the terraform_node is the same host that is running the Ansible playbook. This case will ask Terraform to connect to QEMU/KVM using the uri qemu:///system.

If you want to use a remote QEMU/KVM server instead, you can do this:

all:
    vars:
        ...
    hosts:
        terraform_node:
          ansible_host: 127.0.0.1
          ansible_connection: local
        hypervisor_1:
          ansible_host: remote_kvm_machine.domain
          ansible_user: root
          ansible_port: 22
          ansible_ssh_pass: password

This case will ask Terraform to connect to the QEMU/KVM server using the following uri: qemu+ssh://root@remote_kvm_machine.domain/system. This also setup the Terraform internal ssh connection to use it as a bastion host to connect to his VMs.

Also, Terraform could be separated from Ansible and be located on a remote server. You can declare it simply by using the ansible_host variable, as follows:

all:
    vars:
        ...
    hosts:
        terraform_node:
          ansible_host: remote_terraform_node.domain
          ansible_connection: ssh # or paramiko or whatever NOT local
        hypervisor_1:
          ansible_host: remote_terraform_node.domain
          ansible_connection: ssh # or paramiko or whatever NOT local
          ansible_user: root
          ansible_port: 22
          ansible_ssh_pass: password

This assumes that the terraform_node is the same host that is running the QEMU/KVM hypervisor. This case will ask Terraform to connect to QEMU/KVM using the uri qemu:///system. The post-deployment task of this Ansible playbook, has to use a jumphost to get access to the VMs of the internal network. For this reason we need to use the terraform_node as jumphost to reach them.

Also, if you have a remote QEMU/KVM server and a remote Terraform server, you can use them as follows:

all:
    vars:
        ...
    hosts:
        terraform_node:
          ansible_host: remote_terraform_node.test.com
          ansible_connection: ssh # or paramiko or whatever NOT local
        hypervisor_1:
          ansible_host: remote_kvm_machine.domain
          ansible_user: root
          ansible_port: 22
          ansible_ssh_pass: password

This case will ask Terraform to connect to QEMU/KVM using the uri: qemu+ssh://root@remote_kvm_machine.domain/system. This also setups the Terraform internal ssh connection to use it as a bastion to connect to its VMs. Since already remote, this case will set up 2 jumphosts for Ansible, first one is the terraform_node, the other one is the ansible_host.

Network

Network declaration is mandatory and per-VM.

Declare each device you want to add inside the network_interfaces dictionary.

Be aware that:

the order of declaration is important
the NAT device should always be present (unless you can control your DHCP leases for the external devices) and that should be the first device. It's an important parameter for the way the playbook has to communicate with the VM before setting up all the userspace networks.
the default_route should be assigned to one interface to function properly. If not set it's equal to False.

Supported interface types:

nat
macvtap
bridge

Structure:

        all:
            vars:
                pool_name: default
                disk_source: "~/VirtualMachines/centos8-terraform.qcow2"
            hosts:
                terraform_node:
                    ansible_host: 127.0.0.1
                    ansible_connection: local
                hypervisor_1:
                    ansible_host: 127.0.0.1
                    ansible_connection: local
            children:
                group_1:
                    hosts:
                        host_1:
                            ansible_host: 172.16.0.155
                            os_family: RedHat
                            cpu: 4
                            memory: 8192
                            hypervisor: hypervisor_1
                            network_interfaces:
                                # Nat interface, it should always be the first one you declare.
                                # it does not necessary have to be your default_route or main ansible_host,
                                # but it's important to declare it so ansible has a way to communicate with
                                # the VM and setup all the remaining networks.
                                iface_1:
                                  name: nat             # mandatory
                                  type: nat             # mandatory
                                  ip: 192.168.122.47    # mandatory
                                  gw: 192.168.122.1     # mandatory
                                  dns:                  # mandatory
                                   - 1.1.1.1
                                   - 8.8.8.8
                                  mac_address: "AA:BB:CC:11:24:68"   # optional
                                  # default_route: False
                                iface_2:
                                  name: ens1p0      # mandatory
                                  type: macvtap     # mandatory
                                  ip: 172.16.0.155  # mandatory
                                  gw: 172.16.0.1    # mandatory
                                  dns:              # optional
                                   - 1.1.1.1
                                   - 8.8.8.8
                                  default_route: True # at least one true mandatory, false is optional.

Variables explanation:

name: required Specifies the name for the connection, this is important for bridge and macvtap types as it will be the interface/bridge on the host on which they will be created.
type: required Specifies interface type, supported types are nat, macvtap, bridge.
ip: required Specifies the IP to assign to this interface.
gw: required Specifies the Gateway of this interface.
default_route: at least one required Specifies if this interface is the default route or not. At least one interface set as True.
dns: required Specifies the dns list for this interface, this is an array of IPs.
mac_address: optional Specifies the mac address for this interface.

The playbook will use the available IP returned from the terraform apply command to access the machines and use the os_family way to setup the user-space part of the network:

static IPs
routes
gateways
DNS

After that, the playbook will set the ansible_host variable to its original value, and proceed with the provisioning.

This is important because it will make ansible_host independent from the internal management interface needed for this network bootstrap tasks, making it easily compatible with any type of role that you want to perform after that.

During this process, virtual networks (bridges, VLANs, etc...) inside the vm will be ignored, this will improve detection of the networks we want to manage, improving compatibility with docker, k8s, nested-virtualization, etc...

Storage

This section explain how to add additional disks to the VMs.

Suppose that you want to create a VM that needs a large amount of storage, and a separated disk just to store the configurations. This will be quite simple to achieve.

The main variable you need is data_disks, then you have to specify the disks and the related properties for each one.

If data_disks is mentioned in your inventory, the following variables are required:

size: required. Specifies the disk size expressed in GB. (eg. size: 1 means 1GB)
pool: required. Specifies the pool where you want to store the additional disks.
format: require. Specifies the filesystem format you want to apply to the disk. Available filesystems are specified below.
mount_point: required. Specifies the mount point you want to create for the disk, use none if declaring a swap disk.
encryption: required. Specifies the mount point of the disk. Available values could be True or False.

N.B. Each disk declared must have a unique name (eg. you can't use disk0 twice).

OS Family	Supported Disk Format	Encryption Supported
Debian	`ext2`, `ext3`, `ext4`, `swap`	yes
Alpine	`ext2`, `ext3`, `ext4`, `swap`	yes
FreeBSD	`ufs`, `swap`	no
RedHat	`ext2`, `ext3`, `ext4`, `xfs`, `swap`	yes
Suse	`ext2`, `ext3`, `ext4`, `xfs`, `swap`	yes

Let's take a look at how the inventory file is going to be fill.

        all:
            vars:
                pool_name: default
                disk_source: "~/VirtualMachines/centos8-terraform.qcow2"
            hosts:
                terraform_node:
                    ansible_host: 127.0.0.1
                    ansible_connection: local
                hypervisor_1:
                    ansible_host: 127.0.0.1
                    ansible_connection: local
            children:
                group_1:
                    hosts:
                        host_1:
                            ansible_host: 172.16.0.155
                            os_family: RedHat
                            cpu: 4
                            memory: 8192
                            hypervisor: hypervisor_1
                            # Here we start to declare
                            # the additional disk.
                            data_disks:
                                # Here we declare the disk name
                            	disk-storage: disk0                 # Uniqe name to identify the disk unit.
                            		size: 100                       # Disk size = 100 GB
                            		pool: default                   # Store the disk image into the pool = default.
                            		format: xfs                     # Disk Filesystem = xfs
                            		mount_point: /mnt/data_storage  # The path where the disk is mounted, none is using swap
                            		encryption: True                # Enable disk encryption

                                # Here we declare the disk name
                            	disk-swap: swp0                     # Uniqe name to identify the disk unit.
                            		size: 1                         # Disk size = 1 GB
                            		pool: default                   # Store the disk image into the pool = default.
                            		format: swap                    # Disk Filesystem = swap
                            		mount_point: none               # The path where the disk is mounted, none if using swap
                            		encryption: False               # Does not enable disk encryption

Compatibility

At this time the playbook supports the most common 4 OS families for the Guests:

Alpine
RedHat
- RedHat7
- RedHat8
- Centos7
- Centos8
- RockyLinux
- Almalinux
- Fedora and derivatives ( untested )
Debian
- Debian 9
- Debian 10
- Ubuntu 18
- Ubuntu 20
- Other derivatives ( untested )
Suse
- Leap
- Thumbleweed
- Other derivatives ( untested )
FreeBSD
- FreeBSD 12.x
- FreeBSD 13.x
- Other derivatives ( untested )

This means you'll be able to generate the infrastructure using ONLY the OS listed above.

Hypervisor OS is agnostic, as long as requirements are met.

Installation

Before using Terrible, the following system dependencies needs to be installed: dependencies.

Use the following command to satisfy the project dependencies:

pip3 install --user -r requirements.txt

Container Image

To avoid boring dependencies installation to make Terrible works and speed up your infrastructure deployment, we provided a Dockerfile to build yourself a minimal image with all you need.

We use a debian:buster-slim image to have a compact system, fully compatibile with all the required tools. The container image uses the latest Terrible tag's release.

The minimum packages required to run the container image is Docker (or Podman) and QEMU/KVM installed on the system.

Pull

If you are a lazy person (just like us), you can directly pull the latest image release from DockerHub.

docker pull 89luca89/terrible:latest

Build

To build the image, instead, type the following command:

docker build -t terrible .

This will take some time, grab a cup of coffee and wait.

Run

Once you've built the image, you're ready to run it as in the example below:

docker run \
    -it \
    --rm \
    -v /var/run/libvirt/libvirt-sock:/var/run/libvirt/libvirt-sock \
    -v ./inventory-test.yml:/terrible/inventory-test.yml \
    -v /path/to/vm/images/:/opt/ \
    -v ~/.ssh/:/root/.ssh/ \
    89luca89/terrible

N.B. If you are using RHEL, CentOS or Fedora you need to add the --privileged flag because otherwise SELinux does not allow it to access the libvirt socket.

N.B. If you are using your local QEMU/KVM instance, remember to add --net=host to the command.

Notes:

The volume /var/run/libvirt/libvirt-sock is mandatory if you want to run Terrible locally (a local QEMU/KVM instance). In this way you'll directly interact with the QEMU/KVM api, provided by the system.
The volume ./inventory-test.yml has to include the inventory file inside the container, to deploy the infrastructure.
The volume ~/VirtualMachines/ has to include the qcow2 images inside the container to deploy them.
The volume ~/.ssh has to include your ssh keys into the container to deploy them inside the infrastructure machines.

Usage

To speed up your deployment process, we made our Packer template files available.

The other repository could be found here: packer-terraform-kvm.

Once composed the inventory file, it's time to run your playbook.

To pull up infrastructure:

ansible-playbook -i inventory.yml -u root main.yml

To validate the inventory file:

ansible-playbook -i inventory.yml -u root main.yml --tags validate

To pull down infrastructure (maintaining the resources in place):

ansible-playbook -i inventory.yml -u root main.yml --tags destroy

To completely delete the infrastructure:

ansible-playbook -i inventory.yml -u root main.yml --tags purge

Outputs

Based on your inventory, the complete state of the infrastructure (TF files, TF states, LUKS keys etc...) will be written to ${INVENTORY_NAME}-state.tar.gz this file is essential to keep track of the infrastructure state.

You can (and should) save this state file to keep track of the infrastructure complete state, the *.tar.gz will be restored and saved on each run.

Authors

Luca Di Maio [email protected]
Alessio Greggi [email protected]

License

GNU GPLv3, See LICENSE file.

terrible's People

Contributors

Stargazers

Watchers

terrible's Issues

Use ansible for root/user password

We should use ansible for this type of task instead of terraform

[IMPROVEMENT] Add support for custom commands at the end of the run

It would be nice to add a list of commands that could be useful to be executed at the end of the setup.

For example:

              terrible_custom_provisioners:
                - pip3 install -U ansible ansible-base
                - ansible-pull -U "{{ ansible_pull_custom_url }}"

This could be useful to do final steps to setup a cluster for example

[IMPROVEMENT] Add support for Windows guests

We should add Windows family support.

target-specific folder tree

When deploying a new set of VMs on a target hypervisor, we should use a different folder tree for each target, this will preserve terraform states for specific targets.

Example:

~/.ansible-terraform-kvm
    |_ target-uuid_1
           |_ hypervisor_1
                |_ vm1
    |_ target-uuid_2
           |_ hypervisor_1
                |_ vm1

To generate UUID for target hypervisor we could use his URI for example

[BUG] terraform depends on dhcp leases

To maintain a general code the VM's HCL uses dhcp leases and later switch to static IP in the ansible.

Doing this works the first time OR if the lease is free and the machines always gets the requested IP.

If this is not the case, terraform will fail when passing the 2nd or Nth time, so we should keep the main interface as BOTH dhcp+static.
This will ensure terraform could work properly, and static IP ensured if dhcp is not available for it.

improve documentation abstract

We should be more clear on the advantages of this approach compared to a full-terraform or full-ansible approach.

CI with basic infrastructure deployment

As already mentioned in another thread (#40), this issue is dedicated to the evaluation of the implementation of a Github action to automate the deployment of basic infrastructure, using the following actions:

A basic infrastructure means the ability to deploy one VM for each OS family supported (depending on the time it needs to do the job).

[CI] Migrate from TravisCI

TravisCI.org is no more

we need to migrate to a viable solution for CI testing

We will move to a private baremetal droplet for now.

[IMPROVEMENT] Add support for Alpine guests

We should add Alpine family support

Improve asserts

Some variables specified on the inventory file need to be verified through the Ansible asserts.
Candidate variables could be: os_family, cpu, memory and some ip address.
Es. os_family: must be one of the values specified in the os_family_support variable.

Cloud Init Support

Adding cloudinit support for images can be really useful for the disk_source selection.

the terraform-libvirt provider, supports cloudinit images by using a specific resource:

data "template_file" "user_data" {
  template = file("./cloud_init.yml")
}

resource "libvirt_cloudinit_disk" "commoninit" {
  name      = "commoninit.iso"
  user_data = data.template_file.user_data.rendered
}

we should add them in the terraform-vm.tf.j2 file inside an if/else checking if this is a cloud-init image

the cloud_init.yml file can be a template, this is an example of user data:

#cloud-config
ssh_pwauth: True
chpasswd:
  list: |
    root:password
  expire: False

this enables the root user via ssh with password, so terrible can use it as a normal image

[IMPROVEMENT] Security updates should not be enforced

Security updates should be optional or delegated to terrible_custom_provisioner

this comes handy in situations where no internet is available or to keep track of package versions.

It could be just removed and be a part of terrible_custom_provisioners and left to the user

Add ip management for macvtap/bridged interfaces

For interfaces that are not in a libvirt network (eg. the default NAT, an isolated one etc...) we want the ability to setup a static ip for that interface.

To do this, we should decide if it's something we want to do via terraform, or via ansible.

Example use case:

4 nodes hypervisor cluster, internal shared network between the 4 nodes.
We want the VMs to be reachable via internal network on all 4 nodes, all VMs in all nodes should reach all other VMs in other nodes.

We should try to reach a platform agnostic conclusion if possible (eg. NetworkManager vs wicked vs cloud-init )

CI with inventory-example validation

We should create an action that runs the validate target on the inventory-example file (and maybe all the other examples?)

This should ensure that all our example and yaml documentations is aligned with any last changes in the code and assertion

Fix CI errors

The new CI lint shows some errors and warnings, we should fix them

[IMPROVEMENT] Support non-netplan Ubuntu

Right now, we are taking for granted that Ubuntu uses netplan.

This is not strictly true, for various reasons:

custom image from the user, removing netplan
older versions of ubuntu

We should improve the netplan detection itself instead of relying on ubuntu-vs-debian definition

Support swap partitions for disks

We should add swap support for the disk part

@alegrey91 what do you think?

[IMPROVEMENT] Add support for other type of networks

We should add support for other types of virtual networks like:

Routed
Open
Isolated

Improve Terraform files

We should generate TF files in a way that they describe an entire group, or even better, an entire hypervisor.
This way we do not have situations where if we want to remove a node, we have first to do --tags purge --limit name-of-node and then remove it from the inventory, but it will be automatically deleted when removed from the inventory itself.

[IMPROVEMENT] Add autostart option for VMs

We should add an autostart variable that sets the VM autostart to true or false.
If not specified, default to false.

example:

            cloud-init-node-0:
              os_family: RedHat
              #disk_source: "~/Desktop/cloudinit/CentOS-8-GenericCloud-8.1.1911-20200113.3.x86_64.qcow2"
              disk_source: "https://cloud.centos.org/centos/8/x86_64/images/CentOS-8-GenericCloud-8.1.1911-20200113.3.x86_64.qcow2"
              cpu: 1
              memory: 512
              hypervisor: hypervisor_1
              ansible_host: 192.168.122.200
              cloud_init: True
              autostart: True
              network_interfaces:
                if-0:
                  ...
              data_disks:
                disk-1:
                 ...
                disk-swap:
                 ...

[IMPROVEMENT] Improve Asserts

Some assertions should be improved, for example:

            host-vm-1:
              disk_source: "/mnt/iso/debian10-terraform.qcow2"
              os_family: Debian
              cpu: 1
              memory: 512
              hypervisor: hypervisor_1
              ansible_host: 192.168.122.9
              network_interfaces:
                iface_1:
                  name: default
                  type: nat
                  ip: 192.168.122.9
                  gw: 192.168.122.1
                  dns:
                    - 192.168.122.1
              default_route: True <<<--- note this is unindented

This snippets passes the assertions even if it's not valid
We should review and evaluate more corner cases to include in the assertions

Also we could think of an external tool/linter as @alegrey91 suggests to validate inventories

Problems with multi-hypervisors setup

Right now the use of groups and terraform_node variables to handle multiple hypervisors does not work well, because of: ansible/ansible#32247

This issue suggests to use unique groups for-each parent groups (ex: hypervisor_1, hypervisor_2 etc etc)

while goind this route is feasible, I think it will break compatibility with existing roles that want a specific group structure,

I think we should find a solution that is generic and does not impose specific group names/structures to work.

Rename role

Now that we have a project name, the terraform role should be renamed to terrible

Create Dockerfile

To avoid dependencies problems, we could provide a Dockerfile to allow the users to build the Terrible container by themself.
The container should include:

Ansible
Terraform
Terraform Provider for Libvirt

And possibly also other things if necessary. The Travis pipeline could be inspirational for this issue.

Output the state to file

Being terraform under the hood, we should respect the need to preserve the state of the VM deployments, or else it will not work smoothly on multi-clients setup

the idea is to output a tar.gz of the whole ~/.terrible/$playbook_name folder to a file called something like $playbook-name.state.tar.gz in the same folder of the $playbook file

we should have a restore tag that will take care of restoring the files in ~/.terrible/$playbook_name, but with the never tag also, this has to be an explicit action to do (to avoid overriding existing states if not wanted)

so we will have for example

./inventory-1.yml
./inventory-1.yml.state.tar.gz

we will launch

ansible-playbook -i inventory-multihyper.yml -u root main.yml --tags restore

this will populate the folders in ~/.terrible

then we can proceed with a deployment that will have a known state

[BUG] Saved state handling does not manage well remote terraform hosts

If we declare a remote host for terraform_node, the saved state+inventory bundle will remain there.

we should handle the situation:

  when:
    - hostvars['terraform_node']['ansible_connection'] != 'local'

This should use the copy module if a saved state is present to send it to destination, and use the fetch module to retrieve the new saved state file.

Generate HCL on per-hypervisor basis

Right now, for each VM we generate a separate HCL file in a separate project path,
the right way I think it's to create a per-hypervisor project.

Before

host_1
     |_host_1.tf

host_2
     |_host_2.tf

After:

hypervisor_1:
    |_ variables.ft
    |_ host_1.ft
    |_ host_2.ft

hypervisor_2:
    |_ variables.ft
    |_ host_3.ft
    |_ host_4.ft

This way we will follow more closely terraform best practices, create less file flooding and have already a file hierarchy that reflects closely the infrastructure we are declaring in our inventory

[IMPROVEMENT] Add more targets to CI pipeline

We should add the seguent targets to travis pipeline:

Ubuntu 20.04
Ubuntu 20.04 - image without netplan
Ubuntu 18.04
Ubuntu 18.04 - image without netplan

Also we could add (nice to have)

Centos 7.x
Debian 9

[IMPROVEMENT] Update Dockerfile with last Terrible version

[IMPROVEMENT] Add custom state output folder

It would be nice for it to be configurable,
someone would like to manage it using git, or an s3 bucket, or a specific folder that is rcloned and so on...

right now, it's:

"{{ ansible_inventory_sources[0] }}-state.tar.gz"

It is a nice default, we could move it to defaults/main.yml, and make it configurable.

add additional disks encryption

When we define additional disks, could be useful to have an encryption variable to ask Ansible to encrypt the disk.
The variable should be optional, and needs also other related variables like password, or password_file from we can retrieve the password to encrypt the disk.
The inventory could become as follow:

...
    data_disks:
        disk-storage:
        size: 100
        pool: default
        format: xfs
        mount_point: /mnt/data_storage
        encryption: True
            password: password123

[IMPROVEMENT] Support for other providers

We should support other providers like:

oVirt using: https://github.com/oVirt/terraform-provider-ovirt
Proxmox using: https://github.com/Telmate/terraform-provider-proxmox

To include multiple providers we should think on how we want to specify the provider, for example:

    hypervisor_1:
      ansible_host: remote_kvm_machine.foo.bar
      ansible_user: root
      provider_type: Proxmox

Or something similar, also we should think on how the code generation will behave, for example

roles/terrible/templates/Libvirt
└── ......tf.j2
roles/terrible/templates/Proxmox
└── ......tf.j2
roles/terrible/templates/oVirt
└── ......tf.j2

And use the provider_type as a variable to discover the templates.

Discussion open on how to approach the problem

[IMPROVEMENT] Improve network interface detection

We should ignore virtual interfaces inside machines, just stick to the real ones

[IMPROVEMENT] Add support for executing commands on-destroy

This could be useful for nodes that are part of a cluster and need some type of pruning before shutting down

Variables refactoring

The purpose of this issue is to completely remove ansible internal variables from the project variables (API variables).
The variables in question are such: ansible_host, ansible_ssh_pass and so on.
These variables should be replaced with custom own variables, to avoid to create confusion.

Add additional data disks declaration

We want to be able to declare (multiple) secondary disks for a VM:

eg:

- data_disks:
    - disk1:
         size: 10G
         pool: secondary_pool
         format: ext4
         mount_point: /mnt/data_disk1
     - disk2:
         - - -

This part should be done in a mix of terraform and ansible.

Terraform:

create disk in the pool
attach disk to VM

Ansible:

create disk mountpoint
format disk partition
eventually encrypt it
add entry in fstab

Documentation restyling

Some sections needs to be updated.
Also, we should include a Table of Contents section.
Other ideas could be placed below.

improve jump host integration between terraform and ansible

We should streamline more the jump host definition between the ansible and terraform part.

Right now we have the terraform bastion declaration like this:

        provider_uri: "qemu+ssh://[email protected]/system"
        terraform_bastion_enabled: True
        terraform_bastion_host: 10.90.20.12
        terraform_bastion_password: password
        terraform_bastion_port: 22
        terraform_bastion_user: root

This will declare correctly the jump host the terraform remote-exec commands will use to provision the VMs

This will not work on ansible so we need to add:

        ansible_jump_hosts:
          - {user: root, host: 10.90.20.12, port: 22}

This setup translates to something like:

Ansible + Terraform Server --> KVM Server

In a simple setup like this, we should try to have a much simpler setup, like:

auto-detect if we need a jump-host for terraform if we have ssh in the provider_uri
auto-setup the ansible_jump_hosts if we have terraform_bastion detected.

This however could be tricky if we have a different terraform server, separated from the ansible one:

Ansible Server --> Terraform + KVM Server

This example will indeed need ansible_jump_hosts setup, but NOT terraform_bastion

We should:

auto-detect that provider_uri does NOT contain ssh
terraform_node is NOT local
auto-setup the ansible_jump_hosts accordingly (being the terraform_node the jump_host)

The worst situation to auto-detect is when we have a completely disjointed setup:

Ansible Server --> Terraform Server --> KVM Server

In this situation we have:

provider_uri with ssh, so we need to set terraform_bastion and add a hop to ansible_jump_hosts accordingly
terraform_node is NOT local, so add another hop to ansible_jump_hosts

So the flow should be:

terraform_bastion = false
ansible_jump_hosts = []

if provider_uri does not contain ssh:
    # we have a qemu:///system situation
    if terraform_node is local:
          # ansible, terraform and kvm all on the same machine
          return
    else:
          # local ansible and remote terraform+KVM machine
          ansible_jump_hosts.append(terraform_node)
else
     # we have a qemu+ssh://user@remote_host/system
     if terraform_node is local:
           # we have a local ansible+terraform and remote KVM machine
           ansible_jump_hosts.append(user@remote_host)
      else:
            # we have a local ansible, remote terraform and another remote KVM machine
            ansible_jump_hosts.append(user@remote_host)
            ansible_jump_hosts.append(terraform_node)

Assertion too strict on terraform target hypervisor

# Verifiy the correctness of the parameters.
- name: Validate 'terraform_target_hypervisor' parameter
  assert:
    quiet: yes
    that:
      - hostvars['terraform_node']['terraform_target_hypervisor'] | ipaddr
    fail_msg: >
      You are trying to use an unsupported terraform_target_hypervisor value.
  when:
    - hostvars['terraform_node']['terraform_target_hypervisor'] is defined

@alegrey91 this is too strict, an hostname should be a valid value

[IMPROVEMENT] Add basic tooling

We should have some basic tools to ease the use of the project

Ideally we can gather them in a ./utils folder

Some basic tools can be

generate_inventory to create a basic inventory from a template, passing arguments to it
terrible
- up to easily run the playbook
- down to shut down VMs preserving resources
- purge to purge a playbook
- validate to validate an existing playbook
- check_deps to check if the system meets all terrible requirements

Open to other ideas

Support for FreeBSD

Could be really nice to have the support for FreeBSD.

[IMPROVEMENT] Improve inventory defaults

Right now, setting up a VM can be quite a bit of YAML code, so we should improve this giving better and sane defaults for all aspects of a VM

Right now we could improve setting up:

if only 1 network is defined, it's default_route by default
if only 1 network is defined, DNS and Gateway will default to xxx.xxx.xxx.1
if only 1 network is defined, IP should default to ansible_host
if a disk is declared, it will default to the first FStype (ext4, ufs, etc...)
if a disk is declared, it will default mount to /mnt/${ disk name }
if a disk is declared, it will default to the pool_name
if a disk is declared, and it's a swap it should default mount_point to none without having to declare it
pool_name should default to default
if only 1 hypervisor is declared, VMs should default to it.

Ideally a simple inventory without many variables should be able to declare VMs in a simple manner like:

all:
  vars:
     disk_source: "/mnt/template.qcow2"
     os_family: RedHat
  hosts:
    terraform_node:
      ansible_host: localhost
      ansible_connection: local
    hypervisor_1:
      ansible_host: 10.90.20.12
      ansible_user: root
  children:
    deploy:
      children:
        my_nodes:
          hosts:
            node-0:
              cpu: 2
              memory: 2512
              ansible_host: 192.168.122.200
              data_disks:
                disk-1:
                  size: 5
...

Or even simpler if no external disks are declared

all:
  vars:
     disk_source: "/mnt/template.qcow2"
     os_family: RedHat
  hosts:
    terraform_node:
      ansible_host: localhost
      ansible_connection: local
    hypervisor_1:
      ansible_host: 10.90.20.12
      ansible_user: root
  children:
    deploy:
      children:
        my_nodes:
          hosts:
            node-0:
              cpu: 2
              memory: 2512
              ansible_host: 192.168.122.200
            node-1:
              cpu: 2
              memory: 2512
              ansible_host: 192.168.122.201

....

Cloud-Init support

Hi, does the terraform-libvirt template file support cloud init configuration?

I was using a cloud-init base image and the deploy is stuck on the task TASK [terraform : Terraform apply VMs]

CI with ansible-lint

The next step to improve the code stability is to start to create some kind of basic control.
Following this guide: https://github.com/ansible/ansible-lint-action and other similar, we could implement our own CI pipeline to improve also contribution control.

Implement custom terraform provisioners

We should be able to declare per-hypervisor, per-group or per-vm provisioners for terraform, it would be useful to be a simple list:

terraform_custom_provisioners:
    - "pkg install python3"
    - "pw useradd user1 -g group1 -s /usr/local/bin/bash"
    - "pw usermod user1 -G wheel,www-data"

This should be interpreted in the Jinja2 of the terraform in a way similar to this:

if terraform_custom_provisioners is declared:
    for provisioner in terraform_custom_provisioners:
              GENERATE_TF_CODE_PROVISIONER
    endfor
endif

This could be really useful to support different OS (Like BSD, Solaris etc...) and to perform critical actions before using ansible on the VMs (one example, installing python3 where not present)

Make terraform_node work on remote setups

Right now, terraform node only works on local setups,

we would want to have a separate terraform node that deploys the VM on the KVM host,

changing the terraform_node from a var to a special host in each hypervisor_X group.

the possible use cases are:

local(ansible, terraform, KVM) -> bastion False, this is the easiest case
local(ansible, terraform), remote(KVM) -> bastion True, easy case, set bastion to the KVM machine to access local VMs
local(ansible), remote(terraform, KVM) -> bastion True, works the same as above

local(ansible), remote(terraform), remote_2(KVM) -> UNSUPPORTED

this right now, is what we have to change to support the 4th case:

separate bastions for terraform and ansible - in case the KVM is directly reachable from the ansible machine)
double bastion for ansible - in case the KVM is not directly reachable, but only via the terraform node.

[IMPROVEMENT] Support non-root users

We should support non-root users for both ansible and terraform.

Also for terraform support use of ssh-keys (those have to be already in the template you're using)

[BUG] Docker file should use the corrert source

Right now, we should NOT use the zip release to create dockers, but the very source that is checked out during the action

Affected files are

.github/workflows/docker-build.yml
.github/workflows/docker-tag-release.yml

Add VM-resource reload when changing the terraform file

This should be useful for adding/removing ifaces and passtrought in future.

Add

- name: "Terraform renew target VMs"
  terraform:
    project_path: "{{ hcl_deploy_path }}/{{ inventory_hostname }}"
    force_init: true
    state: absent
    targets: "libvirt_domain.domain-terraform"
  tags: deploy, apply, generate_hcl
  when: terraform_status.changed
  delegate_to: terraform_node

[DOCUMENTATION] improve code commenting

We should improve task-wise comments to be more verbose and useful for possible new contributors, and help them understand

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.