2

We have below command, we can clear the failed tasks and rerun them in a single attempt

airflow clear [-s START_DATE] [-e END_DATE] --only_failed dag_id

Is there any way we can get the information of all the failed tasks from all the dags and export it into file (excel or text)

Ravi
  • 443
  • 3
  • 17

1 Answers1

1

Here's an untested code snippet that should help you

  • Obtain list of failed TaskInstances (you can modify this to add filters like dag_id and start_date)

     from typing import List, Optional
     from airflow.models.taskinstance import TaskInstance
     from airflow.utils import State
     from airflow.settings import Session
     from airflow.utils.db import provide_session
    
     @provide_session
     def get_failed_task_instances(session: Optional[Session] = None) -> List[TaskInstance]:
         """
         Returns list of failed TaskInstance(s)
          - for all DAGs since inception of time
          - sorted by (1) dag_id ASC (2) start_date DESC
         :param session: Optional[Session]
         :return: List[TaskInstance]
         """
         failed_task_instances: List[TaskInstance] = session.query(TaskInstance). \
             filter(TaskInstance.state == State.FAILED). \
             order_by(TaskInstance.dag_id.asc(), TaskInstance.start_date.desc()). \
             all()
         return failed_task_instances
    
  • (Utility function to) Extract relevant bits from TaskInstance like dag_id, start_date & task_id (change it as per your need)

     def ti_to_string(ti: TaskInstance) -> List[str]:
         """
         Converts a TaskInstance in List[str] by extracting relevant bits of info from it
         :param ti: TaskInstance
         :return: List[str]
         """
         return [ti.dag_id, ti.start_date, ti.task_id]
    
  • Putting it all together: writing data to output CSV file

     import csv
     def write_failed_task_instances_to_csv(output_file_path: str) -> None:
         """
         Writes list of failed tasks in the provided output CSV filepath
         :param output_file_path:
         :return: None
         """
         # prepare list of failed TaskInstance(s)
         failed_task_instances: List[TaskInstance] = get_failed_task_instances()
         # extract relevant bits of info from TaskInstance(s) list (to serialize them)
         failed_task_instances_data: List[List[str]] = list(map(ti_to_string, failed_task_instances))
         # write data of failed TaskInstance(s) to output CSV filepath
         with open(output_file_path, "w", newline="") as f:
             writer = csv.writer(f)
             writer.writerows(failed_task_instances_data)
    

References

y2k-shubham
  • 6,703
  • 7
  • 39
  • 85
  • Thanks a lot for the response. Is there a possibility we can get last executed time for a particular dag run ? – Ravi Aug 16 '20 at 17:45
  • if i have a dag like " dag_employee_info " , can i get latest completed run timestamp for this dag . any suggestion pls ? – Ravi Aug 16 '20 at 18:11
  • **@Ravi** last execution time of DAG **[1]** you can build it yourself by exploiting `SQLAlchemy` [`DagRun` model](https://github.com/apache/airflow/blob/21021228759da8d3e98ca3f6d0922a6e9a0b5e68/airflow/models/dagrun.py#L44) as in above answer. **[2]** you can take first line from o/p of [`list_dag_runs`](https://airflow.apache.org/docs/stable/cli-ref#list_dag_runs) CLI command **[3]** A quick [Google search](https://www.google.com/search?q=airflow+get+last+dag+run) yields results like [this](https://stackoverflow.com/q/57607042/3679900) and [this](https://stackoverflow.com/q/51923684/3679900) – y2k-shubham Aug 16 '20 at 19:27