Ansible is great for configuration management and does a good job of creating an environment that is (mostly) free of configuration drift. Using AWX, I’ve created a fair number of playbooks and roles which are executed regularly to keep things up to date and humming along.

Terraform is a tool designed to enable Infrastructure-as-Code (IaC), meaning that the infrastructure is declaratively defined in code which can be managed by code ersioning tools such as git. As the code changes, Terraform allows you to see what the changes are planned to be applied and execute them in a controlled manner.

Ansible can also be used for IaC because it supports idempotency, but the primary difference is that it’s stateless. Once you change the code, the next time it’s run the new state will be enforced. There is no visibility into the current state and what will be changed from current state (although I guess –check sort of does this). Terraform offers that visibility.

I did some research into ways to integrate Terraform and Ansible to provide the best of both worlds. Since I’m already invested in using AWX as the mechanism for scheduling and running playbooks, I chose to go with calling Terraform from Ansible playbooks. It can be used the other way as well: use Terraform to provision the infrastructure and have it run an Ansible playbook as a post-provisioning task to configure it.

My use case for the proof-of-concept was to leverage the Proxmox API to create a VM.

Inside my Ansible repository, I created a terraform directory with subdirectories to contain my Terraform projects. Inside of the pve project, I created a terraform.tfvars file containing the values for calling the Proxmox API.

The variables file contains sensitive information so it was encrypted with ansible-vault before committing with git (original, unencrypted file is added to .gitignore). If the ununcrypted version doesn’t exist, the playbook will decrypt it.

terraform.tfvars:

ve_api_host = "192.168.1.x:8006"
pve_api_url = "https://192.168.1.x:8006/api2/json"
pve_user = "terraform@pve"
pve_pass = "redacted"

The main.tf contains the declarations:

main.tf:

variable "pve_api_url" {} 
variable "pve_user" {}
variable "pve_pass" {}

terraform {
  required_providers {
    proxmox = {
      source = "telmate/proxmox"
      version = "2.7.3"
    }
  }
}

provider "proxmox" {
  pm_api_url = var.pve_api_url
  pm_user = var.pve_user
  pm_password = var.pve_pass
  pm_tls_insecure = true
  pm_log_enable = true
  pm_log_file = "terraform-plugin-proxmox.log"
  pm_log_levels = {
    _default = "debug"
    _capturelog = ""
  }
}


resource "proxmox_vm_qemu" "vmtest" {

    name = "tftest"
    target_node = "node2"
    iso = "qnap-iso:iso/debian-10.3.0-amd64-netinst.iso"
    os_type = "ubuntu"
        os_network_config =  <<EOF
auto eth0
iface eth0 inet dhcp
EOF
    disk { // This disk will become scsi0
      type = "scsi"

      storage = "vmdata"
      size = "10G"
    }
}

The first step is to initialize the project which will download the providers needed and create the terraform.tfstate file.

terraform init

Then a ’terraform plan’ will look at the current state and display what is to be done:

$ terraform plan

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # proxmox_vm_qemu.vmtest will be created
  + resource "proxmox_vm_qemu" "vmtest" {
      + additional_wait           = 15
      + agent                     = 0
      + balloon                   = 0
      + bios                      = "seabios"
      + boot                      = "cdn"
      + bootdisk                  = (known after apply)
      + clone_wait                = 15
      + cores                     = 1
      + cpu                       = "host"
      + default_ipv4_address      = (known after apply)
      + define_connection_info    = true
      + force_create              = false
      + full_clone                = true
      + guest_agent_ready_timeout = 600
      + hotplug                   = "network,disk,usb"
      + id                        = (known after apply)
      + iso                       = "qnap-iso:iso/debian-10.3.0-amd64-netinst.iso"
      + kvm                       = true
      + memory                    = 512
      + name                      = "tftest"
      + nameserver                = (known after apply)
      + numa                      = false
      + onboot                    = true
      + os_network_config         = <<-EOT
            auto eth0
            iface eth0 inet dhcp
        EOT
      + os_type                   = "ubuntu"
      + preprovision              = true
      + qemu_os                   = "l26"
      + reboot_required           = (known after apply)
      + scsihw                    = (known after apply)
      + searchdomain              = (known after apply)
      + sockets                   = 1
      + ssh_host                  = (known after apply)
      + ssh_port                  = (known after apply)
      + target_node               = "athena"
      + unused_disk               = (known after apply)
      + vcpus                     = 0
      + vlan                      = -1
      + vmid                      = (known after apply)

      + disk {
          + backup       = 0
          + cache        = "none"
          + file         = (known after apply)
          + format       = (known after apply)
          + iothread     = 0
          + mbps         = 0
          + mbps_rd      = 0
          + mbps_rd_max  = 0
          + mbps_wr      = 0
          + mbps_wr_max  = 0
          + media        = (known after apply)
          + replicate    = 0
          + size         = "10G"
          + slot         = (known after apply)
          + ssd          = 0
          + storage      = "vmdata"
          + storage_type = (known after apply)
          + type         = "scsi"
          + volume       = (known after apply)
        }
    }

Plan: 1 to add, 0 to change, 0 to destroy.

Note: You didn't use the -out option to save this plan, so Terraform can't guarantee to take exactly these actions if you run "terraform apply" now.

Once happy with the plan, just execute ’terraform apply’. Once created, the resources configuration can be changed as the infrastructure needs change and the lifecycle managed entirely with Terraform and Git. Without changing the code, the infrastructure can be de-provisioned with ’terraform destroy'.

For the Ansible part, I just need a playbook which calls Terraform:

---
- hosts: localhost
  name: Create infrastructure with Terraform
  vars:
    terraform_dir: terraform/pve

  pre_tasks:
    - name: Decrypt variables file
      ansible.builtin.copy:
        src: "{{ terraform_dir }}/terraform.tfvars-encrypted"
        dest: "{{ terraform_dir }}/terraform.tfvars"
        decrypt: yes
                                           
  tasks:
    - name: Create infrastructure with Terraform
      community.general.terraform:
          project_path: "{{ terraform_dir }}"
          state: present
      register: outputs
                                                     
    - name: Add new infrastructure to inventory
      block:
        - name: Add public DNS to host group
          add_host: 
            name: "{{ item }}" 
            groups: pve-vms
          loop: "{{ outputs.ipv4_address.value }}"
      when: outputs is iterable
      rescue:
         - name: Print when errors
           ansible.builtin.debug:
              msg: "No new infrastructure provisioned"

- hosts: pve-vms
  gather_facts: false
                                                                   
  tasks:
    - name: Wait for instances to become reachable over SSH
      wait_for_connection:
        delay: 60
        timeout: 600
    - name: Ping
      ping:

This was a test example to test out using Terraform with the PVE API. It did create VM as expected but it booted off of installation media and was waiting for user input so the remainder of the playbook couldn’t execute. In a real use case, I would go through the installation process, configure cloud-init, and convert it into a template for cloning as I did with the RancherOS VMs before creating a VM from the template.

Dynamic Inventory

With VMs now being created and destroyed programmatically, it’s time to use a dynamic inventory plug-in.

plugin: community.general.proxmox
validate_certs: 'no'
url: https://192.168.1.x:8006
user: ansible@pve
password: password
keyed_groups:
  - key: proxmox_tags_parsed
    separator: ""
    prefix: group
groups:
  webservers: "'web' in (proxmox_tags_parsed|list)"
  mailservers: "'mail' in (proxmox_tags_parsed|list)"
compose:
  ansible_port: 22

Then I can get a list of dynamic groups:

$  ansible-inventory -i inventory/my.proxmox.yml --graph
@all:
  |--@proxmox_all_lxc:
  |--@proxmox_all_qemu:
  |  |--kubein
  |  |--kubemgr
  |  |--kubenode1
  |  |--kubenode2
  |  |--kubenode3
  |--@proxmox_all_running:
  |  |--kubein
  |  |--kubemgr
  |  |--kubenode1
  |  |--kubenode2
  |  |--kubenode3

These groups can now be used in a playbook:

- name: Test ssh connectivity
  hosts: proxmox_all_running
  gather_facts: false
  ignore_errors: yes
  remote_user: ansible
  tasks:
    - ping:
    - name: Run date command
      command: date
$ ansible-playbook -i inventory/ ping-vms.yml

PLAY [Test ssh connectivity] ****************************************************************************************************************************************************************************************

TASK [ping] *********************************************************************************************************************************************************************************************************
ok: [kubein]
ok: [kubenode3]
ok: [kubenode2]
ok: [kubemgr]
ok: [kubenode1]

TASK [Run date command] *********************************************************************************************************************************************************************************************
changed: [kubenode2]
changed: [kubenode1]
changed: [kubein]
changed: [kubenode3]
changed: [kubemgr]

PLAY RECAP **********************************************************************************************************************************************************************************************************
kubein                     : ok=2    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
kubemgr                    : ok=2    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
kubenode1                  : ok=2    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
kubenode2                  : ok=2    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
kubenode3                  : ok=2    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

The next playbook I’m working on is to install glusterFS, allocate ZFS datasets to hold the data bricks, and create the gluster volumes.

Gluster will be used to provide shared storage between the nodes since the NFS mount on the QNAP will be going away at some point. The existing VMs are using local storage and live migration isn’t supported except on shared storage.