Introduction
We understand that airflow is a workflow/job orchestration engine and can execute various tasks by connecting to our environments. There are various ways to connect to an environment. One needs is connection details about that environment to connect to. For example
- Postgres DB – Hostname, Port, Schema
- SSH – Hostname which allows SSH connections.
The list may extend to AWS, Google Cloud or Azure as well. But all of them need to some sort of connection information. Airflow is no different and needs connection information.
One only needs to log in and goto Admin->Connections to see the exhaustive list of connections which are possible.


Airflow allows for various connections and some of the documentation can be found on this link. You can see the various types of connections which can be made by airflow. This makes connecting to different types of technologies so easy by airflow. Keep in mind you can create your own connections if you do not find one which suits you. Though I doubt it but who knows!
Going into all the different types of connections is really not possible. However, we can look at one of the most often used one – SSH connections. This is one of the most widely used airflow connections and makes it a good example to provide for this blog entry. We will also look at a few more as we go thru the airflow blog series.
SSH Airflow Connections
The use case for this is pretty simple. There is a requirement to execute a shell script on a remote server. As a part of our airflow workflow orchestration, we need to enable airflow to execute a shell script or Linux command be executed on a remote server. Let’s get started
For this blog entry – I have already created an EC2 instance which has a working shell script. Now airflow should be able to execute this shell script as a part of a DAG.
The shell script is in /home/ec2-user/my_test.sh
echo $(hostname -i)
Nothing fancy but should give us a good idea about what airflow connection can do.
The following steps are involved
Step-1 – Goto Admin->Connections

Step-2 – Click on Create

Step-3 – Enter the connection details and Save

Note: There are a couple of catches before you jump ahead
- Make sure you are able to access your Linux box from airflow server before jumping head.
- Install the following python library and its dependencies
- ssh_tunnel
List of connections now shows the new ssh connection.

Step-4 Create a DAG
Below is the dag used for this blog entry. Add this to the $DAG_HOME as hello_connections.py
from airflow import DAG
from airflow.contrib.operators.ssh_operator import SSHOperator
from airflow.contrib.hooks import SSHHook
from datetime import datetime, timedelta
default_args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': datetime(2019, 8, 31),
'email': ['[email protected]'],
'email_on_failure': False,
'email_on_retry': False,
'retries': 1,
'retry_delay': timedelta(minutes=5),
# 'queue': 'bash_queue',
# 'pool': 'backfill',
# 'priority_weight': 10,
# 'end_date': datetime(2016, 1, 1),
}
dag = DAG( 'hello_connections1',
schedule_interval='0 0 * * *' ,
default_args=default_args
)
sshHook = SSHHook('my_blog_ssh_connection')
linux_command = "sh /home/ec2-user/my_test.sh "
t1 = SSHOperator(
ssh_hook=sshHook,
task_id='test_remote_script',
command=linux_command,
dag=dag)
Note: Observe in the string – linux_command. At the end, there is a space. If you do not give the space airflow interprets the string as a jinja template. More of this can be found on this link
When the dag is executed the following is the output. The relevant lines are highlighted.

The airflow server goes to the remote AWS EC2 instance and executes a shell script and outputs the results to airflow log. As you can see now airflow can now connect to different servers and execute commands and complete more complicated workflows.
This brings us to the end of the blog. Hope you find this entry useful. Till next time …byeeeeeeee!
Thanks for the useful steps. I have done something but the difference is my shell script in the target server consists of a docker start script and in that script, it has few environment keys and passwords. My issue is the shell is running perfectly but during the execution log, I can see my environment variables and password exposed in the logs. Did you faced a similar issue on your side as well?
Thank you for simplicity in explanation, that allows to jump straight into the script coding, however I’ve a question I’ve found answer yet – what if the remote job you are triggering is long running, e.g. 12h, and you cannot keep the ssh tunnel for the whole period of job execution, how would you monitor that jobs has finished with a success or failure if there is another job dependency on it?
I am trying to set connection from airflow to my aws ec2 linux server using ssh connection option. Normally I am able to ssh to that ec2 instance but I try Airflow ssh connection test I get
[Errno 2] No such file or directory: ‘/Users/archanakarpe/Downloads/20-march.pem’
Path and permission works when I directly ssh to that ec2 instance.
So as per this airflow-connections document is saying
Note: There are a couple of catches before you jump ahead
Make sure you are able to access your Linux box from airflow server before jumping head.
Install the following python library and its dependencies
ssh_tunnel
So could someone help me how to set this ssh_tunnel in this case or is something else I am doing wrong.
Please guide me