Infrastructure as Code using Terraform

Sun, Jul 23, 2017

One of the strategic benefits of Cloud-computing is the concept of programmable infrastructure or “Infrastructure as Code”. So what exactly does this mean?

Historically, infrastructure provisioning has been done via a combination of shell scripts and manual operations. Of course, once a computing environment is initially built, it’s often tweaked and modified over months to address various issues. This could leave the environment in an unknown state. Tracking it’s state – “What was changed? Why? by Whom?” is very involved. Rebuilding or cloning an entire environment could take days, if not weeks.

“Infrastructure as Code” provides a solution to this problem. It piggybacks on the API exposed by cloud providers like AWS and Azure and provides the ability to provision an entire computing environment, both application code AND the underlying infrastructure it runs on – the networks, servers, load-balancers – in a repeatable and versioned manner.

It’s important to note that this code can be stored in a version control system and reviewed like application code. We can know the history of all the changes made. And the results of the code – the current state of the infrastructure – can also be saved and reviewed whenever needed.

One such tool to make this possible is Terraform from Hashicorp – the company behind Vagrant, Consul and Vault.

To get a feel for Terraform as a tool, I’ve put together a small auto-scaling microservice project on Amazon Web Services. Hopefully this project will be simple enough to introduce some interesting concepts and challenging enough to help you get your hands dirty!

Note: The really interesting part of this project is that the microservice will be deployed to infrastructure that dynamically resizes itself (i.e “auto-scales”) when load increases.

If this interests you, follow along. You will need an Amazon Web Services account to run this project but it should fall under the AWS free tier, which is always nice.

About Terraform:

It’s DSL-based: uses a Domain-Specific Language called HCL (Hashicorp Configuration Language)
It’s cloud-agnostic: the same DSL syntax (different keywords) can provision resources in AWS, Microsoft Azure or Digital Ocean.
It’s declarative: You declare resources in code and Terraform figures out the dependency graph and handles provisioning them.
It’s CLI-driven:

terraform plan shows you what Terraform will attempt to do, helping to prevent unintentional changes.
terraform run actually provisions resources
terraform destroy deletes them.

It’s stateful: Terraform keeps track of provisioned resources either locally (terraform.tfstate file) or on a remote backend (like Amazon S3 for easier collaboration).

Project Outline:

Deploy a sample microservice (“hit-counter”) to an autoscaling group behind an elastic load balancer.
Test how autoscaling works by increasing load on the instances and see it scale out automatically.
Optional: Be notified of autoscaling events via email. Because you need more email, yes? :)

Before running the project:

Configure your local machine to interact with AWS:

If you don’t already have a SSH keypair, you will need to generate one to login to your EC2 instances.

Okay, let’s get to it!

Overview diagram:

Inputs:

Clone or download this github repo: https://github.com/codelusion/terraform-autoscaling
Come up with a unique bucket name
Supply a public key string for SSH access.

Script outline:

Creates an S3 bucket and uploads zipped code (hit-counter.zip) to it.
Creates User-Data (bootstrap) script:
- The cloud-init or user data script is updated with the S3 bucket name, using terraform templating.
- It installs Node.js, PM2 and the microservice code and launches the service
Creates an Autoscaling group
Creates an Elastic Load balancer
- Service exposes port:3000 and ELB translates to port:80
A CloudWatch alarm is created, tracking Instance metrics (eg. when CPU Utilization exceeds 50% for 1 min)
An autoscaling policy is attached to ASG, which increases instance count by 1 on an alarm
An SNS Topic is created tracking the alarm status, allowing us to subscribe and know when instances are created or terminated.
- The output of the terraform script is the final ELB Url
The current state of infrastructure is stored in file: terraform.tfstate.

Terraform demo:

Exercises for the reader:

Update the “desired capacity” (number of instances) and do a terraform apply to apply changes on the fly.
To test autoscaling , a stress testing tool is installed to each instance. Log into one of the instances and run stress -c 2 to generate 100% CPU utilization and trigger autoscaling after a minute.
Destroy the entire stack using terraform destroy.

Summary:

Managing computing environments using the “infrastructure as code” approach is rapidly becoming a de-facto standard. It’s one of the core concepts to grasp in order to build agile, elastic, cloud-native software systems of the future. Hope you find this project helpful in your learning journey!