Airflow logs are stored in the filesystem by default in $AIRFLOW_HOME/dags directory, this is also called remote logging. Airflow logs can also be easily configured to be stored on AWS S3 as well. This blog entry describes the steps required to configure Airflow to store its logs on an S3 bucket.
The blog entry is divided into the following sections
- Create S3 Connection
- Configure airflow.cfg
For this blog entry, we are running airflow on an ubuntu server which has access to AWS S3 buckets via AWS-CLI.
Note: If you are using an EC2 instance please makes sure that your instance has read-write access to S3 buckets configured.
Create S3 Connection
To enable remote logging in airflow, we need to make use of an airflow plugin which can be installed as a part of airflow pip install command. Please refer to this blog entry for more details.
Create an S3 Connection – See below
Once created we need to add this connection to the airflow.cfg.
The airflow.cfg file is in $AIRFLOW_HOME. Once opened make the following changes to the file.
Set the following configuration parameters
- remote_logging = True
- remote_log_conn_id = <<your connection id created in above section>>
- remote_base_log_folder = s3://your-bucket/path/to/logs-folder
See example below.
That’s it we are now ready to bounce our server and execute a DAG.
Let’s look at logs of one of the DAGs that we have executed.
Right-click on the task above to view the log
You will see that the log being shown is from a remote S3 bucket. See below
Let’s look at our S3 bucket as well. You will see that there are various folders and sub folders are created and it essentially matches the same directory structure as created in any other filesystem.
This brings us to the end of this quick blog entry for configuring airflow to write remote logs to an S3 bucket. I hope this helps you. If you like this entry then do press like it! and also share it!