Airflow – Variables

In the last entry on airflow(which was many moons ago!) we had created a simple DAG and executed it. It was simple to get us started. However, it did not allow us any flexibility. We could not change the behaviour of the DAG. In the real world scenarios, we may need to change the behaviour of our workflow based on certain parameters. This is accomplished by Airflow Variables

Airflow Variables are simple key-value pairs which are stored in the database which holds the airflow metadata. These variables can be created & managed via the airflow UI or airflow CLI.

Airflow WebUI -> Admin -> Variables

Some of the features of Airflow variables are below

  • Can be defined as a simple key-value pair
  • One variable can hold a list of key-value pairs as well!
  • Stored in airflow database which holds the metadata
  • Can be used in the Airflow DAG code as jinja variables.

Define a single key-value variable

Airflow variables can be created using three ways

  1. Using Airflow Web UI
  2. Airflow CLI
  3. File Import

Using Airflow Web UI

To define a variable to hold a single value is done in the following steps

Step 1 – Navigate to Airflow UI->Admin->Variables

Step 2 – Press Create

Step 3 – Define the variable

Press Save to create the variable.

Airflow CLI

Variables can also be created via Airflow CLI. Use the following command to create variable via CLI

airflow variables --set my_new_var testval

The variable is then also visible in the Airflow Web UI

The CLI allows all creation, deletion and import of variables and is quite easy. Here is the documentation link

Usage

Variables can be quite easily be used in the code as jinja variables. Below is an example of a DAG which uses one of the variables we have defined earlier – v_name

You can access a variable using the following API

var.value.<variable name>

For example var.value.v_name. See Below

# Filename: hello_world_variables.py
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from datetime import datetime, timedelta

default_args = {
  'owner': 'airflow',
  'depends_on_past': False,
  'start_date': datetime(2019, 7, 13),
  'email': ['vipin.chadha@gmail.com'],
  'email_on_failure': False,
  'email_on_retry': False,
  'retries': 1,
  'retry_delay': timedelta(minutes=5),

}

dag = DAG(  'hello_world_variables',
        schedule_interval='0 0 * * *' ,
        default_args=default_args
    )

# Use of variable in the command
create_command = 'echo {{var.value.v_name}}'

t1 = BashOperator(
  task_id='print_date',
  bash_command='date',
  dag=dag
)

t2 = BashOperator(
  task_id= 'echo_my_variable',
  bash_command=create_command,
  dag=dag
)

t2.set_upstream(t1)

Let’s execute the DAG and see the output

Once the DAG is executed you can see the logs using the following steps

Click on the DAG

Click on Graph View and then echo_my_variable

Click on View Log

The log of the task is shown below with the output of our command highlighted

Define a variable – list of key-value pairs

If your DAG uses more than one variable then(depending upon the use case) they can be stored as a list of key-value pairs in an airflow variable. The variables can be created via UI and CLI or imported as well.

Airflow UI

Navigate to Airflow Web UI ->Admin -> Variables and create a variable called another_var. Observe how the key-value pairs are added as a JSON. See Below

Save and you have a variable which is a list of key-value pairs

Airflow CLI

From the command line, the variables with a list of key-value pairs can be created using the following command

airflow variables --set my_new_var1 '{ "v_command": "ls /var/log", "v_some_var": "HELLO FROM CLI" }'
(Click on the image)

The variable is now also available in the Airflow Web UI.

Usage

Let’s now see how the variables with a list of key-value pairs are used in the code. The DAG below uses a key-value pair from the variable another_var. See Below

# Filename: hello_world_variables_2.py
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from datetime import datetime, timedelta

default_args = {
  'owner': 'airflow',
  'depends_on_past': False,
  'start_date': datetime(2019, 7, 19),
  'email': ['vipin.chadha@gmail.com'],
  'email_on_failure': False,
  'email_on_retry': False,
  'retries': 1,
  'retry_delay': timedelta(minutes=5),

}

dag = DAG(  'hello_world_variables_2',
        schedule_interval='0 0 * * *' ,
        default_args=default_args
    )


create_command = '{{var.json.another_var.v_command}}'

t1 = BashOperator(
  task_id='print_date',
  bash_command='date',
  dag=dag
)

t2 = BashOperator(
  task_id= 'run_my_command',
  bash_command=create_command,
  dag=dag
)

t2.set_upstream(t1)

Once the DAG is executed you can see the logs using the following steps

Click on DAG

Click on Graph View and then run_my_variable

Click on Log

The log is shown below

This brings us to the end of this post. I hope you found it useful. Next up how we can do branching. Till next time…byeeeeee

Leave a Reply

Your email address will not be published. Required fields are marked *