Airflow Single-Engineer Group
About the Airflow SEG
Latest video
Previous 5 videos
Date | Tl;DW; | Video |
---|---|---|
2023-01-12 | DAG overview page is now pretty | https://youtu.be/E3_YGF7Wr2k |
2023-01-05 | Developed the first Airflow page with an overview of Dags | https://youtu.be/oFs4OsHZfRw |
2022-12-21 | First video that started this SEG | https://youtu.be/Jrjp6_rdDo4 |
Apache Airflow
Airflow is the de facto tool for data teams to schedule and execute ELT pipelines, Machine Learning pipelines, DevOps tasks and really any task that requires scheduling. Its cronjob turned up to 11.
According to Airflow themselves:
Airflow is a platform created by the community to programmatically author, schedule and monitor workflows
Source: https://airflow.apache.org
A workflow is also called a Directed Acyclic Graph (DAG) in Airflow, a DAG contains tasks which utilize operators.
Typical workflow of developing DAGs
A typical development workflow looks like:
- User locally creates or updates a DAG
- Push code to Git
- Pipeline deploys the DAG to a non-production Airflow instance
- User visits the Airflow webserver to inspect and run the DAG
- If the pipeline succeeds the DAG will via merge requests make its way to production
Common challenges
Below are some common challenges related to Airflow, in no particular order:
- Airflow is single-tenant. In development, users are overwriting each others DAGs if they deploy to the same instance
- Airflow is quite difficult to set up properly for a production environment
- Developing DAGs is often very iterative
- It’s difficult to spot bugs in a DAG during code review without actually deploying it to an Airflow instance
GitLab integration
Below are a few of the initial options to integrate GitLab and Airflow:
- Integration of the DAG overview into GitLab
- GitLab as authentication provider for Airflow
- Using GitLab runners as compute for Airflow
- Using preview apps to create an instance of Airflow per MR to ease the code review process
- Provision an Airflow instance directly from GitLab
Last modified July 29, 2024: Fixing more formatting issues caused by the handbook migration (
9b5eccc2
)