Note: This is NOT a duplicate of
I have to trigger certain tasks at remote systems from my Airflow
DAG
. The straight-forward way to achieve this is SSHHook
.
The problem is that the remote system is an EMR
cluster which is itself created at runtime (by an upstream task) using EmrCreateJobFlowOperator
. So while I can get hold of job_flow_id
of the launched EMR cluster (using XCOM
), what I need is to an ssh_conn_id
to be passed to each downstream task.
Looking at the docs and code, it is evident that Airflow will try to look up for this connection (using conn_id
) in db and environment variables, so now the problem boils down to being able to set either of these two properties at runtime (from within an operator
).
This seems a rather common problem because if this isn't achievable then the utility of EmrCreateJobFlowOperator
would be severely hampered; but I haven't come across any example demonstrating it.
- Is it possible to create (and also destroy) either of these from within an Airflow operator?
- Connection (persisted in Airflow's db)
- Environment Variable (should be accessible to all downstream tasks and not just current task as told here)
- If not, what are my options?
I'm on
Airflow v1.10
Python 3.6.6
emr-5.15
(can upgrade if required)