I am able to run my pipelines using the kedro run command without issue. For some reason though I can't access my context and catalog from Jupyter Notebook anymore. When I run kedro jupyter notebook and start a new (or existing) notebook using my project name when selecting "New", I get the errors following errors:


NameError: name 'context' is not defined


NameError: name 'catalog' is not defined


After running the magic command %kedro_reload I can see that my ProjectContext init_spark_session is looking for files in project_name/notebooks instead of project_name/src. I tried changing the working directory in my Jupyter Notebook session with %cd ../src and os.ch_dir('../src') but kedro still looks in the notebooks folder:


java.io.FileNotFoundException: File file:/Users/user_name/Documents/app_name/kedro/notebooks/dist/project_name-0.1-py3.8.egg does not exist

_spark_session.sparkContext.addPyFile() is looking in the wrong place. When I comment out this line from my ProjectContext this error goes away but I receive another one about not being able to find my Oracle driver when trying to load a dataset from the catalog:

df = catalog.load('dataset')

java.lang.ClassNotFoundException: oracle.jdbc.driver.OracleDriver


For reference:


    def init_spark_session(self) -> None:
        """Initialises a SparkSession using the config defined in project's conf folder."""

        # Load the spark configuration in spark.yaml using the config loader
        parameters = self.config_loader.get("spark*", "spark*/**")
        spark_conf = SparkConf().setAll(parameters.items())

        # Initialise the spark session
        spark_session_conf = (
        _spark_session = spark_session_conf.getOrCreate()


# You can define spark specific configuration here.

spark.driver.maxResultSize: 8g
spark.hadoop.fs.s3a.impl: org.apache.hadoop.fs.s3a.S3AFileSystem
spark.sql.execution.arrow.pyspark.enabled: true

# https://kedro.readthedocs.io/en/stable/11_tools_integration/01_pyspark.html#tips-for-maximising-concurrency-using-threadrunner
spark.scheduler.mode: FAIR

# JDBC driver
spark.jars: drivers/ojdbc8-
Pierre Delecto
  • 342
  • 1
  • 3
  • 19

1 Answers1


I think a combination of this might help you:

  • Generally, let's try to avoid manually interfering with the current working directory, so let's remove os.chdir in your notebook. Construct an absolute path where possible.
  • In your init_spark_session, when addPyFile, use absolute path instead. self.project_path points to the root directory of your Kedro project, so you can use it to construct the path to your PyFile accordingly, e.g. _spark_session.sparkContext.addPyFile(f'{self.project_path}/src/dist/project_name-{__version__}-py3.8.egg')

Not sure why you would need to add the PyFile though, but maybe you have a specific reason.

Lim H.
  • 9,049
  • 9
  • 43
  • 70