Airflow – Connections

Introduction

We understand that airflow is a workflow/job orchestration engine and can execute various tasks by connecting to our environments. There are various ways to connect to an environment. One needs is connection details about that environment to connect to. For example

  • Postgres DB – Hostname, Port, Schema
  • SSH – Hostname which allows SSH connections.

The list may extend to AWS, Google Cloud or Azure as well. But all of them need to some sort of connection information. Airflow is no different and needs connection information.

One only needs to log in and goto Admin->Connections to see the exhaustive list of connections which are possible.

Airflow allows for various connections and some of the documentation can be found on this link. You can see the various types of connections which can be made by airflow. This makes connecting to different types of technologies so easy by airflow. Keep in mind you can create your own connections if you do not find one which suits you. Though I doubt it but who knows!

Going into all the different types of connections is really not possible. However, we can look at one of the most often used one – SSH connections. This is one of the most widely used airflow connections and makes it a good example to provide for this blog entry. We will also look at a few more as we go thru the airflow blog series.

SSH Airflow Connections

The use case for this is pretty simple. There is a requirement to execute a shell script on a remote server. As a part of our airflow workflow orchestration, we need to enable airflow to execute a shell script or Linux command be executed on a remote server. Let’s get started

For this blog entry – I have already created an EC2 instance which has a working shell script. Now airflow should be able to execute this shell script as a part of a DAG.

The shell script is in /home/ec2-user/my_test.sh

echo ******Hello World from AWS EC2 Instance*******
echo $(hostname -i)

Nothing fancy but should give us a good idea about what airflow connection can do.

The following steps are involved

Step-1 – Goto Admin->Connections

Step-2 – Click on Create

Step-3 – Enter the connection details and Save

Note: There are a couple of catches before you jump ahead

  1. Make sure you are able to access your Linux box from airflow server before jumping head.
  2. Install the following python library and its dependencies
    • ssh_tunnel

List of connections now shows the new ssh connection.

Step-4 Create a DAG

Below is the dag used for this blog entry. Add this to the $DAG_HOME as hello_connections.py

# Filename: hello_connections1.py
from airflow import DAG
from airflow.contrib.operators.ssh_operator import SSHOperator
from airflow.contrib.hooks import SSHHook
from datetime import datetime, timedelta

default_args = {
  'owner': 'airflow',
  'depends_on_past': False,
  'start_date': datetime(2019, 8, 31),
  'email': ['airflow@example.com'],
  'email_on_failure': False,
  'email_on_retry': False,
  'retries': 1,
  'retry_delay': timedelta(minutes=5),
# 'queue': 'bash_queue',
# 'pool': 'backfill',
# 'priority_weight': 10,
# 'end_date': datetime(2016, 1, 1),
}

dag = DAG(  'hello_connections1',
        schedule_interval='0 0 * * *' ,
        default_args=default_args
    )

sshHook = SSHHook('my_blog_ssh_connection')
linux_command = "sh /home/ec2-user/my_test.sh "

t1 = SSHOperator(
    ssh_hook=sshHook,
    task_id='test_remote_script',
    command=linux_command,
    dag=dag)

Note: Observe in the string – linux_command. At the end, there is a space. If you do not give the space airflow interprets the string as a jinja template. More of this can be found on this link

When the dag is executed the following is the output. The relevant lines are highlighted.

The airflow server goes to the remote AWS EC2 instance and executes a shell script and outputs the results to airflow log. As you can see now airflow can now connect to different servers and execute commands and complete more complicated workflows.

This brings us to the end of the blog. Hope you find this entry useful. Till next time …byeeeeeeee!

Leave a Reply

Your email address will not be published. Required fields are marked *