0

How can I use a Jupyter Notebook as a node in Kedro pipeline? This is different from converting functions from Jupyter Notebooks into Kedro nodes. What I want to do is using the full notebook as the node.

MCK
  • 11

2 Answers2

2

Although this is technically possible (via nbconvert, for example), this is strongly discouraged for multiple reasons including the lack of testability and reproducibility of the notebooks among others.

The best practice is usually to keep your pipeline node functions pure (where applicable), meaning that they don't incur any side effects. The way notebooks work generally contradicts with that principle.

Dmitry Deryabin
  • 1,320
  • 11
  • 23
0

AFAIK Kedro doesn't support this but Ploomber does (disclaimer: I'm the author). Tasks can be notebooks, scripts, functions, or any combination of them. You can run locally, Airflow, or Kubernetes (using Argo workflows).

If using a notebook or script as a pipeline task, Ploomber creates a copy whenever you run the pipeline. For example, you can create functions to pre-process your data and add a final task that trains a model in a notebook, this way you can leverage the ipynb format to generate reports for your model training procedure.

This is how a pipeline declaration looks like:

tasks:
  - source: notebook.ipynb
    product:
      nb: output.html
      data: output.csv

  - source: another.ipynb
    product:
      nb: another.html
      data: another.csv

Resources:

Edu
  • 1,089
  • 7
  • 12