1

I have a Git repository which (among other things) holds Airflow DAGs in airflow directory. I have a clone of the repository besides an install directory of Airflow. airflow directory in Git is pointed to by AIRFLOW_HOME configuration variable.

I would like to allow imports from modules in the repository that are listed outside airflow folder (please see the structure below).

<repo root>
   |_airflow
      |_dags
         |_dag.py
   |_module1
   |_module2
   |_...

So that in dag.py I can do:

from module1 import Module1

Currently, it does not seem possible without tricks like editing sys.path explicitly which is not very elegant and has to be done in each of the dag source files...

Making an installable package out of the module1 is also out of the question.

sophros
  • 8,714
  • 5
  • 30
  • 57
  • Why can't you make them into a package? Is it code privacy issue? If so, to what degree does it need to be kept private? – PirateNinjas Aug 27 '19 at 13:54
  • @PirateNinjas: You guessed right - one of the issues is the code privacy. – sophros Aug 27 '19 at 14:08
  • I'm aware of just 2 ways (both of which, I assume, your'e already aware): **[1]** package your code into an [Airflow plugin](https://airflow.apache.org/plugins.html) **[2]** Made code discoverable by updating python path; but apart from updating `sys.path` programmatically, we can also (I personally do) update `PYTHONPATH` once and for all in `.bashrc` – y2k-shubham Aug 28 '19 at 03:58
  • An alternative, to juggling the path would be to obfuscate the code so you can deploy it a package or directly as code. You could do this by using cython to build your package into shared object files. This isn't a perfect solution, but it will work for your problem I think. – PirateNinjas Aug 28 '19 at 08:41
  • @y2k-shubham: Indeed, maybe Airflow plugins are the way to go. One base class that does the path manipulation could serve the purpose of setting the environment for all of the inheriting DAGs I create. Please convert the comment to an answer so that I can accept it. – sophros Aug 28 '19 at 10:01

1 Answers1

1

Re-writing conclusion from discussions here


Broadly, there are 2 possible ways

  1. Package your code into an Airflow plugin
  2. Make your code discoverable to dag-definition-file(s) parsing processes by updating PYTHONPATH. Here again we have following options

    (a) Update PYTHONPATH on system level using bashrc / equivalent (once-and-for-all) or just export the updated PYTHONPATH for current bash session

    (b) Programmatically update sys.path in the beginning of DAG-definition file

y2k-shubham
  • 6,703
  • 7
  • 39
  • 85