Using Ansible; how to structure yaml and projects

I find ansible to be very easy to jump into using, but all the different ways to structure projects can make it confusing to understand how to optimally stucture things. This is in part due to changes in ansible over time that have substantially altered the best ways to accomplish things. I am going to briefly show how to better understand the yaml file format since it rarely seems to get covered and then go over methods I use for ansible project structure on the whole.

Ansible is a frequent part of my projects

To see how I have put ansible to use, check out my writeup on building a kubernetes cluster on digitalocean Cluster built with ansible. That was just one use for it. I also have an ansible playbook for building my development machine (Build an Ubuntu dev machine) so that I can quickly restore/rebuild if I have a major issue, and other playbooks for tasks like docker image builds and raspberry pi installation.

How yaml format works

Yaml and json are mostly interoperable, yaml is just a simplified form. Think of yaml as json without any braces/brackets, and a few other changes.

There are dictionaries and arrays

These are the yaml building blocks, any yaml structure is composed of these used in combination. As I mentioned, it is like json except without any of the braces and brackets, which can make it a bit harder to separate one structure from another, but also makes the syntax very clean.
Levels of indentation define object structure -
Arrays or dictionaries at a specific level (of nesting) must have at least as much indentation as the current block level. Since there are no braces/brackets it takes python approach where indentation is critical to defining structures.

Nested Dictionaries are dependent on greater nesting

A nested dictionary MUST have greater indentation to indicate it is nested. A dictionary is key-value pairs, just like a JSON object.

key: value  
key2:  
  subkey: value  
  subkey3:  
    nestedkey: value

Arrays do not require greater nesting than the defining key

An array does not need to be nested, so both of these are valid:

arr1:  
  - member1  
  - member2  
arr2:  
- member1  
- member2

Arrays use the dash (-) to denote members since it does not use brackets ([ ]) or commas like would be the case in JSON. Since both of these are valid you will run into both and it is just something to remember that it is just a style preference.

How Ansible code is structured

It can be confusing looking at different examples because there are several/many ways to do it, but here are the building blocks and how they are put together.

Play -

A play is a block level above tasks. It specifies hosts and attaches one or more tasks to run on them. If you wanted to run just a single task, the play would have an associated task and module/keyswords to define what it does with those hosts.

---
- hosts: groupname
  // other options (become, gather facts, ..)
  tasks: 
    // Array of tasks

The nice thing about ansible structure however is that you can break the tasks off into tasks files or roles so that the play specifies the hosts to target just once and then tasks/roles are attached based on some logic.

Tasks -

Uses module, modules options, and keywords to define actions to execute. The target hosts are determined by the containing play object. If the task is in a task file the “task:” heading does not need to be specified either, just sets of module/keyword combinations.

  - name: Check if ripgrep is installed
    command: dpkg-query -W ripgrep
    register: check_deb

In this instance there is a module (command) keyword (register) and the name, which is technically a keyword but is used at the start of most tasks so that is not important to know.

Keywords

These are modifiers available for use in tasks. They are not module specific, but may alter the task/module behavior.
Some keywords are used constantly, like name The example above already has used two; naming a task (name) registering the result (register) are both keyword commands. There are many commonly used ones like register, loop, when, vars, tags, become.

Keywords add generic functionality to modules

Keywords perform generally useful functionality that can apply to most modules. They can allow looping over input, saving output, making a task conditional upon various factors, and more.

Module

Modules specify the action for a task, which is modified with options to the module and keywords. This example is of the file module.

  file:
    path: "{{ tmp_dir }}"
    state: absent

There are a huge number of modules, so the module documentation index is your friend Ansible All Modules Index . Sub keys specify module options (path, state, in this case), but this will vary module to module. Before trying to put together your own setup to accomplish a task, it is always best to check the module index first.

Playbook -

File containing plays set up to run tasks. You can put everything (plays, tasks, vars …) in monolithic playbooks or break things up using roles. The latter is preferred because it allows taking advantage of the modular structure of ansible so you can set up re-usable tasks, split out variables, files, host information and so forth and have some sort of clear organization.
To run a playbook -
ansible-playbook playbook.yml

Block -

Arbitrary grouping block that allows you to set keywords for all tasks container within that block. Allows a level of organization above tasks but less than roles/playbooks. This example adds a when condition, so the grouped tasks are executed only when it is true.

- name: "Install Helm"
  block:
    - name: "Task 1"
      get_url:
        url: https://raw.githubusercontent.com/helm/helm/master/scripts/get
        dest: "{{ tmp_dir }}/get_helm.sh"

    - name: "Task 2"
      shell: "{{ tmp_dir }}/get_helm.sh"
  when: helm_exists.rc > 0

Roles -

A set of tasks with all the supporting vars, files, handlers organized into separate files, but readily available. The tasks are also in their own file. The tasks from a role are then called in by an organizing playbook. Each category (task, var, handlers) gets its own folder and the files inside are assumed to contain just that type of code.
The location specifies what the file contents will be
Files in the tasks folder do not need a task: heading. It is assumed to be an array of tasks. Files in the vars folder do not need a vars: object. It is assumed to be an object containing variables. And so forth.

Top level Organization for Ansible -

Top level directory and file structure -

main_inventory     - the primary inventory file. Can have many of these.  
ansible.cfg        - general config file  
group_vars/  
	vars1      - files for group specific variables  
	vars2  
host_vars/  
	host1      - files for host specific variables  
	host2  
site.yml           - a master playbook
webservers.yml     - Other playbooks, optionally
roles/             - Roles; can be import/included into any playbook.  
  role1/
    tasks/
    vars/

I am not a fan of the roles naming convention

To have a default task, var, handler, or any other file, it must be called ‘main’ in that directory. This leads to lots of files all called ‘main’. I mostly give them other names and explicitly import/include by those names.

Importing roles in a playbook

You just specify the hosts in the main playbook, along with the role/tasks to import and run for those hosts.

- hosts: nodes  
  // Old role import syntax
  roles:  
  - { role: tools, task: reset }  
  - { role: common, task: all}  
  // New role import syntax; import or include.  
 - name: Import a specific task file from role
   import_role:  
     name: myrole  
     tasks_from: taskfile  
 - name: Include a specific task file from role
   include_role:  
     name: other_role/or/path  
     tasks_from: othertaskfile

There are different ways to add roles, I am demonstrating three here. Roles can also be included conditionally based on variables, previous out, or tags.

Then run the playbook to use those roles

The job of the playbook ends up being to designate what hosts to target and import roles to run on them when using roles. This makes it easy to set up playbooks with different host targets running different roles on those targets.

Tags can allow more control when calling playbooks

For more specific control, I sometimes set up tags so I can run parts of a playbook, as opposed to setting up playbooks for each use case.