Terraform – External Data Source

Terraform external data source is a very interesting feature which not many people seem to use. If you are using terraform in your day to day work it is a nifty tool. This post talks about terraform external data source and how it can be used for some really intresting stuff.

Introduction

In a nutshell, an external data source as the name(not a great one!) implies tries to get some information from an outside source and present it to Terraform. Terraform external data source executes either a shell script or python or for that matter any other program. Terraform uses the output of the program like any other data source. This means that an external data source provides Terraform with a way to interface with the outside world. Which can be super helpful!

Note: External data source is agnostic of AWS, Azure or Google cloud provider or any other cloud provider.

In this blog post, we will cover both the flavours of scripts – shell scripts and python scripts. In case you are still starting out with Terraform – I suggest please read the following posts before you jump into external data sources.

Basics of External Data Source

External data source as mentioned allows Terraform to interface with the outside environment. So what’s the catch here(there is always a catch) – the inputs and outputs to the program have to be JSON objects. To use Terraform external data source with shell scripting you better know jq.

Note: External data source allows Terraform to interact with the outside environment. It is important that all the required software is installed beforehand – else this may lead to nasty surprises later. Use it only when you cannot find a solution inside Terraform.

One potential use-case – where I found this useful(and you may not agree) is getting information about AWS(or any other cloud) resources when Terraform data resource is not giving all the information about that cloud resource. For example, Terraform data resource for AWS Managed Service for Kafka did not provide subnet information for a broker(at least not till I wrote my code). Even though this information is available via AWS CLI or boto library.

There are not too many parameters to pass to the external data source – just two!

External data source parameterDescription
programProgram to execute – python or shell script by external data source
queryVariables to pass to the program as JSON

Note: The program is executed locally

Finally, So here is a use-case for our blog – we want to return ip address and port based on the two parameters – environment and url passed to the shell script. This information is not available in Terraform as it always magically changes ;). Before we move on both the examples for shell script and python pretty much do the same thing – just to show for the purpose of this post.

External Data Source & Shell Script

Let’s look at how we can get Terraform external data source to call shell script. The following steps will be performed

  1. Create a shell script
  2. Call a shell script from Terraform external data source
  3. Pass some parameters to it and do some processing.
  4. Once the processing is done the script will return the output back to terraform as a JSON object.

Nothing fancy!

Note: Make sure you have jq unless you want to hand-crank JSON objects and also any other libs/packages for your shell script are installed.

Terraform Script

Our terraform script ext_data_source.tf is pretty simple it calls the shell script get_ip_port.sh and passes a parameter p_env which has a value of dev. Now, this value can be anything else – something which is dynamically generated as a part of the provisioning of resources by Terraform!

data "external" "get_ip_addres_using_shell_dev" {
  program = ["bash","scripts/get_ip_port.sh"]
  query = {
    p_env = "dev"
  }
}

output "ip_address_for_dev" {
  value = data.external.get_ip_addres_using_shell_dev.result.ip_address
}

output "port_num_for_dev" {
  value = data.external.get_ip_addres_using_shell_dev.result.port_num
}

Shell Script

#!/bin/bash

# Step#0 - Magical list of ip addresses and ports which cannot exist in terraform
declare -A test_var

test_var["dev"]="10.0.0.1:8081"
test_var["qa"]="10.0.0.2:8082"
test_var["uat"]="10.0.0.3:8083"
test_var["stage"]="10.0.0.4:8084"
test_var["prod"]="10.0.0.5:8085"

# Step#1 - Parse the input
eval "$(jq -r '@sh "p_env=\(.p_env)"')"


# Step#2 - Extract the ip address and port number based on the key passed
url_str=${test_var[$p_env]}
arr=(${url_str//:/ })
IP_ADDRESS=${arr[0]}
PORT_NUM=${arr[1]}

# Step#3 - Create a JSON object and pass it back
jq -n --arg ip_address "$IP_ADDRESS" \
      --arg port_num "$PORT_NUM" \
      '
{"ip_address":$ip_address, "port_num":$port_num}'

Let’s look at the shell script a bit more in detail

  • Step#1 – The input is parsed as a JSON and data extracted into a variable called p_env
  • Step#2 – The variable p_dev is used to extract the ip address and port number from the hashmap
  • Step#3 – Use jq library to return the JSON object which holds the ip address and port number

Let’s quickly take a look at the output when terraform is applied

Terraform output

External Data Source & Python

Terraform Script

Our terraform script ext_data_source.tf is pretty simple it calls the shell script get_ip_port.py and passes a parameter p_env which has a value of qa. As you can see this is very similar to the one in the previous section.

data "external" "get_ip_addres_using_python" {
  program = ["python3","scripts/get_ip_port.py"]
  query = {
    p_env = "qa"
  }
}

output "ip_address_for_qa" {
  value = data.external.get_ip_addres_using_python_dev.result.ip_address
}

output "port_num_for_qa" {
  value = data.external.get_ip_addres_using_python_dev.result.port_num
}

Python Script

import sys
import json

# Magical list of ip addresses and ports which cannot exist in terraform
test_var = dict()
test_var["dev"]   = "10.0.0.1:8081"
test_var["qa"]    = "10.0.0.2:8082"
test_var["uat"]   = "10.0.0.3:8083"
test_var["stage"] = "10.0.0.4:8084"
test_var["prod"]  = "10.0.0.5:8085"

# Step#1 - Parse the input
input = sys.stdin.read()
input_json = json.loads(input)

# Step#2 - Extract the ip address and port number based on the key passed
arr = test_var[input_json.get("p_env")].split(":")
ip_address = arr[0]
port_num = arr[1]


# Step#3 -  Create a JSON object and just print it(i.e send it to stdout)
output = {
    "ip_address": ip_address,
    "port_num": port_num
}

output_json = json.dumps(output,indent=2)
print(output_json)

Below is a quick recap of the steps which are done in the python script which are very similar to the shell script in the earlier section

  • Step#1 – Parse the input
  • Step#2 – Extract the ip address and port number from the dictonary
  • Step#3 – Return the JSON object which holds the ip address and port number

See below the output when the Terraform script is applied

Terraform output

As you can see from the above that external data source is a really helpful tool for interfacing Terraform with the outside environment. And obviously not difficult to grasp either. Hope you find this post useful. If you like it – please do share it and spread the knowledge! 🙂

Leave a Comment