0

I'm trying to create a dynamically generated DAG in Airflow.

The main component is where my code dynamically generates a bunch of tasks to run simultaneously (using a custom Operator to create each task). All those tasks have similar format (ie. same dependencies & same downstream), so I don't have the issue described in this question. In plain Python terminology, I'd just define a function once and then instantiate it multiple times with different arguments.

However, if I'm going to do it as a regular Python function, it seems strange to have so much regular Python mixed into my Airflow DAG. My understanding of Airflow was that it's best to keep all logic in custom operators, outside of the DAG.

On the other hand, I'm not sure if it's possible (and if so, how) to have a custom Operator which creates instances of a different Operator to be used directly in the DAG.

Is it bad practice to have Python logic directly within a DAG? If it's a bad idea, how should I implement this pattern?

(Using Airflow 1.10.12 and Python 3.7)

S.S.
  • 506
  • 2
  • 15

1 Answers1

0

It is. The official documentation states in its best practices section that

In general, you should not write any code outside the tasks. The code outside the tasks runs every time Airflow parses the DAG, which happens every second by default.

You could create a function in a python file and then call it from a "DAG" file. Or maybe the new Taskflow API of Airflow 2.0 helps you:

https://airflow.apache.org/docs/apache-airflow/stable/tutorial_taskflow_api.html

Javier López Tomás
  • 1,063
  • 1
  • 9
  • 24