Terraform, Docker & CI/CD

Introduction Terraform, Docker and CI/CD are not just buzzwords but must-haves in tools. They are used in some combination or another in most modern cloud ecosystems. They are also made to look really difficult – but in reality, you need to take one step at a time and slowly string them all together and improve them. With most languages, we create artefacts and store them in some artefact repository like nexus or jFrog. The same applies to docker images as well, they can be stored in private or public docker image repositories. These artefact repositories help version artefacts and make them quite structured. However, when you turn to terraform, I found the same missing. Why because – well the answer you get most often is that it is a configuration or something on those lines. So what can be done to get around this problem of missing versioning our infrastructure??? … Read more

Terraform – External Data Source

Terraform external data source is a very interesting feature which not many people seem to use. If you are using terraform in your day to day work it is a nifty tool. This post talks about terraform external data source and how it can be used for some really intresting stuff. Introduction In a nutshell, an external data source as the name(not a great one!) implies tries to get some information from an outside source and present it to Terraform. Terraform external data source executes either a shell script or python or for that matter any other program. Terraform uses the output of the program like any other data source. This means that an external data source provides Terraform with a way to interface with the outside world. Which can be super helpful! In this blog post, we will cover both the flavours of scripts – shell scripts and python … Read more

Airflow – External Task Sensor

Airflow External Task Sensor deserves a separate blog entry. It is a really powerful feature in airflow and can help you sort out dependencies for many use-cases – a must-have tool. This blog entry introduces the external task sensors and how they can be quickly implemented in your ecosystem. Before you dive into this post, if this is the first time you are reading about sensors I would recommend you read the following entry Airflow – Sensors Why use External Task Sensor Here is my thought as to why an external task sensor is very useful. Most traditional scheduling is time-based. This means that the dependencies between jobs are base on an assumption that the first job will definitely finish before the next job starts. But what happens if the first job fails or is processing more data than usual and may be delayed? Well, we have what is called … Read more

Airflow – Sensors

Airflow sensors are like operators but perform a special task in an airflow DAG. They check for a particular condition at regular intervals and when it is met they pass to control downstream tasks in a DAG. There are various types of sensors and in this mini blog series, we intend to explore. Before you begin to read further. In case, you are beginning to learn airflow – Do have a look at these blog posts Getting Started with Airflow First DAG Airflow Connections Introduction The fastest way to learn how to use an airflow sensor is to look at an example. In this blog post, we will be looking at an example using S3KeySensor for reading a file as soon as they arrive in S3. There are other sensors that are available as well. Some of them are S3 Key Sensor SQL Sesnsor HTTP Sensor HDFS Sensor Hive Sensor … Read more

Airflow – Dynamic DAGs

Introduction Over the last couple of months got lots of queries around dynamic dags. So here is a post on dynamic dags in an airflow. Dynamic tasks is probably one of the best features of airflow. It allows you to launch airflow tasks dynamically inside an airflow DAG. A simple use case can be if you want to launch a shell script with different parameters in a list all at the same time. To make things more fun is that the list size changes all the time. This is where airflow really shines. Let’s see how If you are new to airflow – before you jump into reading this post read these posts for a getting some background Airflow getting started How to write a DAG Airflow Variables Airflow DAG – Dynamic Tasks – Example-1 Creating dynamic tasks in a DAG is pretty simple. For the purpose of this section, … Read more

Airflow & SLA Management

Introduction Came across this interesting feature of managing SLA’s natively in airflow. Failures can happen not just by an actual failure of a task/pipeline but may a slow running task/pipeline. A slow running task/pipeline may cause downstream tasks or DAGs which depend upon it to fail. This thought got me searching and I decided to write a post about Airflow and SLAs Management. What I found was a simple solution which can help manage failures and delays. Before you jump into understanding how to manage SLAs in airflow make sure you are familiar with how to create airflow DAGs . This post is divided into the following parts Time duration is defined on a task. Airflow will monitor the performance of task/DAG when SLAs are enabled. When SLAs are breached airflow can trigger the following An email notification An action via a callback function call Note 1: SLAs checks occur … Read more

Terraform – AWS VPCs

Talks about VPCs, Subnets & CIDR blocks, Routing tables, Internet Gateway and NAT Gateway Introduction Wow – Finally – made it to VPCs. This one I always wanted to write. VPCs are at the very heart of AWS – VPC stands for Virtual Private Cloud as the name suggests your own cloud! your own thing. All the resources you create reside inside a VPC. Till now we have only used the default-vpc. No more from this blog onwards – exciting world awaits. When you are inside a VPC – it isolates your resources on the network level to a point where if you need two resources which are across the VPCs it requires quite a special effort(i.e mystical beast called VPC peering). You can create as many VPCs as you want. This blog post is mostly theoretical concepts about two concepts – Subnets and Routing. While you are inside these … Read more

Terraform – Provisioners

Introduction Terraform Provisioners allow you to execute custom scripts/code on the remote machine which is being provisioned or something locally on which the terraform code is being executed. There are various provisioners which are available in terraform. In this blog, we will look at the generic type of provisioners – file, local and remote provisioners. The blog builds from this previous blog and is divided into the following sections. All the code for these sections are available on the github as well. The link is provided at the end of each section. File Provisioner File provisioner allows you to transfer files locally on which terraform code is being executed to the resource which is being provisioned. For example, you want to copy a configuration file or directory from the local to your EC2 instance. This can be easily accomplished by File provisioner. Here is the example of the instance.tf which … Read more

Terraform – State

Introduction Terraform state is something which is usually tackled quite later in the game when folks have become quite comfortable with writing terraform code. But, we are going to do it much earlier :). You will see that it helps you visualise your code a lot better and have more interesting/innovative ideas. In this blog entry, we discuss terraform state by breaking it down into smaller chunks. You can find all the code for this blog here. Look for blog-5. So let’s start off…. What is terraform state? Terraform state is a record of any infrastructure or configuration which has been provisioned by terraform. Simple as that. So, whenever you add/change any part of the terraform code and apply it, terraform will compare it with the state and apply only the required changes. It does not execute the whole code which makes it pretty performant if you have large infrastructures. … Read more

Terraform – Variables

Introduction Till now we have pretty much kept the code written pretty simple and straight forward. In the real world, it is hardly ever is. Similar to other languages terraform also provides different types of variables and they can be used in many different ways. In this blog, we will introduce various types of terraform variables and see how we can use them inside the terraform code. The blog post is divided into the following sections As you can see it makes a pretty long read. But it will be very helpful in your projects. The various ways of passing variables to terraform helps you manage What is source controlled/what is NOT! How are secrets managed? How is terraform code executed? This blog entry creates the same infra-structure which was built in the previous blog post but uses variables to make it more reusable. Before we get started all the … Read more