Saturday, July 7, 2018

An Intro to Infrastracture as a Code with Terraform and Ansible

1. Problem:
In the past the tasks related to hardware and system administration were pretty separated from the ones of software development, i.e, assigned to different teams and individuals, and often led to so called "dev-ops" wars with the former throwing their code to ops for deployment and monitoring, and viceversa the latter complaining about improper documentation and filling tickets to ask a dev to solve encountered issues/bugs. This also meant developers making frequent changes and wishing to see them applied, and contrarily operation people caring about stability and thus slowing the deployment process. Beside the different skillset necessary for the two profiles the main reason was also the pretty manual (or non-portable) nature of system administration and software release in general. The advent of cloud computing with services such as AWS and Google Cloud forced an abstraction over computing resources and therefore a shift from bare metal management to more abstract environments. Besides, we generally handle pools of distributed resources, e.g. computing clusters, rather than simple servers. This required newer configuration tools to ease their setup and maintenance, so called infrastructure-as-a-code tools. This made the job of the operations team closer to that of coding, with many actually advocating for only one "DevOps" profile [1]. The main benefit is the reduction of communication overhead between the two, and consequently, more frequent code releases. As such, the DevOps movement is very related to that of agile development [2, 3], and continuous integration and continuous deployment [4], being it a set of practices to automate integration testing to detect introduction of code errors, as well as to speed up deployment of new functionalities by also automating the infrastructure setup and management.

2. Benefits:
The main benefits of infrastructure-as-a-code tools are: less manual work for operations and more automated and reproducible actions, which provides implicit documentation for those sharing the those scripts, as well as the application of common software development practices, such as version control and code review, but also modularity and reusability, for instance the pipeline performing the integration and/or deployment for the DEV branch can be later reused for TEST and PROD. As mentioned, this also speeds up the whole development process, practically bringing faster new functionalities into production. It is also important to distinguish between server provisioning tools from configuration management tools, with the former mainly specifying the environment in terms of computing resources (though to some extent also software dependencies can often be defined), and the latter that are to be used to setup the actual runtime environment. Accordingly, provisioning tools such as Terraform, only provide the computing environment, and delegate tools like Docker to define an image to be ran across all nodes, which will be of the same exact kind until the individual containers are ran and modifications are made. This approach is commonly referred to as immutable infrastructure paradigm. Contrarily, configuration management tools defines in an imperative or procedural way the steps necessary to achieve the wished runtime environment and consequently this approach is referred to as mutable infrastructure paradigm, as each separated server is being configured simultaneously and will thus have its own history, which would potentially lead to misalignments across configurations. Another aspect to remark is the fact that certain tools require installing specific agents to perform updates on the configuration, and might also require electing a master node to manage the infrastructure state globally and i) instruct the updates on the agents (push), or rather the agents pull updates from the master. Viceversa tools such as Ansible do not define a Master node, i.e., actions can be ran from any of the nodes, with the risk of running different versions of the configuration script. 

3. Terraform:

Terraform is an open-source tool written in Go and meant to abstract the setup and management of multiple cloud providers, such as AWS and Azure. Accordingly, the developer writes portable configuration files which are then actuated by Terraform using the specific cloud provider's API. Specifically, code is written in the HashiCorp Configuration Language (HCL), a declarative language whose files have extension tf.

3.1 Definition file structure
A Terraform definition file generally consists of a provider and a bunch of resources [5]:

  • the provider specifies which underlying client should be used, as based on the selected cloud service, e.g. aws;
    provider "PROVIDER" {
       [properties ...]
    }
  • the resource consists of a resource type based on which then different parameters are provided, and a resource name;
    resource "TYPE" "NAME" {
       [configuration parameters ...]
    }
    The resource can then be referred across the specification file using the notation "${TYPE.NAME.ATTRIBUTE}".
  • input variables can be specified, along its type, a description comment, and its default value (default is string otherwise)
    variable "NAME" {
       type = ""
       description = ""
       default = [DEFAULT_VALS ...]
    }
    In case no default value was defined the variables are to be defined when running terraform (i.e. with either plan or apply), either interactively or can be specified using -var VAR="value". Variables can be accessed using the notation "${var.NAME}".
Shall the specified provider not be available, i.e., the client was not yet configured, it can be set up automatically by going to the folder where the definition file is contained and by running "terraform init". Once the provider and resources are specified the expected changes can be shown by using "terraform plan" instead. Accordingly, terraform has a planning step where it generates an execution plan, i.e. which let the user understand what changes will be made before they actually take place. Also, a resource graph is built with all resources to best parallelise their creation and make their status better visible to users. The resource graph can be shown by running "terraform graph".  This way terraform can keep track of all resources previously created and only perform delta updates shall there be changes to the specification file.  Specifically, terraform creates a terraform.tfstate file specifying in JSON format the current infrastructure status and based on which deltas are planned. Clearly access on this shared status file can mean race conditions. For the purpose a remote state storage can be set on terraform, such as Amazon S3 or a few other distributed storage services. We refer to [5] for advanced Terraform topics, such as team collaboration and Zero-Downtime deployment. 

I follow this short intro with a Terraform tutorial on Github.

4. Ansible:

Ansible is an open source tool to manage the configuration and application deployment of networked systems. Main competitors are Chef, Puppet and Salt. Ansible stands out due to its simplicity. Ansible builds on the existing SSH protocol to execute commands on remote machines and allows for the orchestration of networked services. Being written in Python, Ansible main dependency is the Python 2.6+ interpreter. We can distinguish in two paradigms: i) push and ii) pull. In a pull architecture, tool-specific agents periodically poll a central repository for updates, whereas in the push counterpart a master initializes the process. As such, Ansible only needs to be installed on the controlling machine and does not require other tools but SSH installed on the target nodes. While a similar behavior could be achieved by opening multiple parallel SSH connections (e.g. using iterm on MacOSX), what Ansible actually provides is an abstraction layer over commands, the modules, which can be written in any scripting language and offer a system agnostic interface to common system commands, as well as the important property of idempotency. Idempotency guarantees that even when repeated multiple times, an operation will always set the system in the same state.


4.1 Inventory
An inventory is a file listing the nodes to be accessed to carry out a certain configuration process. Hosts are associated to groups (e.g. service or application name for a distributed service) and identified by their hostname or IP address. Authentication occurs via a password or a private key, which is can also be specified in the inventory.

4.2 Playbook
A playbook expresses the configuration and orchestration process by mapping a host group from the inventory to a set of roles. Roles allows for code organization and reuse. Accordingly, each called role is actuated as a sequence of calls to Ansible modules. As such, the playbook merely specifies  process variables (e.g., paths) and a top level workflow composed by a list of basic roles (i.e., sub-workflows), each achieving a different configuration goal. 


Bibliography

  1. G. Kim et Al. The DevOps Handbook: How to create world-class agility, reliability & security in technology organizations. IT Revolution Press 2015.
  2. R. Eric. The Lean Startup: How Today's Entrepreneurs Use Continuous Innovation to Create Radically Successful Businesses. New York: Crown Business, 2011.
  3. M. Poppendieck et Al. Lean Software Development: An Agile Toolkit. Addison-Wesley 2003.
  4. J. Humble and David Farley. Continuous Delivery: Reliable Software Releases Through Build, Test, and Deployment Automation (1st ed.). Addison-Wesley Professional 2010.
  5. Y. Brikman. Terraform Up & Running. Writing Infrastructure as a Code. O'Reilly 2017.
  6. L. Hochstein. Ansible Up & Running. Automating Configuration management and deployment the easy way. O'Reilly 2015.
  7. B. Beyer et Al. Site Reliability Engineering. How Google runs production systems. O'Reilly 2017.

No comments:

Post a Comment