Questions tagged [kedro]

Kedro is an open source Python library that helps you build production-ready data and analytics pipelines

90 questions
2
votes
1 answer

Does kedro support tfrecord?

To train tensorflow keras models on AI Platform using Docker containers, we convert our raw images stored on GCS to a tfrecord dataset using tf.data.Dataset. Thereby the data is never stored locally. Instead the raw images are transformed directly…
2
votes
1 answer

Does Kedro support Checkpointing/Caching of Results?

Let's say we have multiple long running pipeline nodes. It seems quite straight forward to checkpoint or cache the intermediate results, so when nodes after a checkpoint are changed or added only these nodes must be executed again. Does Kedro…
Sir ExecLP
  • 83
  • 1
  • 5
2
votes
1 answer

Passing nested parameters in the extra_params of the load_context in Kedro

I am trying to load a Kedro context with some extra parameters. My intention is to update the configs in parameters.yml with only the ones passed in extra_params (so rest of the configs should remain same). I will then use this instance of context…
Mohit
  • 985
  • 3
  • 16
  • 40
2
votes
2 answers

Is there IO functionality to store trained models in kedro?

In the IO section of the kedro API docs I could not find functionality w.r.t. storing trained models (e.g. .pkl, .joblib, ONNX, PMML)? Have I missed something?
thinwybk
  • 2,493
  • 16
  • 40
2
votes
1 answer

How do I add many CSV files to the catalog in Kedro?

I have hundreds of CSV files that I want to process similarly. For simplicity, we can assume that they are all in ./data/01_raw/ (like ./data/01_raw/1.csv, ./data/02_raw/2.csv) etc. I would much rather not give each file a different name and keep…
Srikiran
  • 165
  • 1
  • 2
  • 7
2
votes
1 answer

how to deploy the kedro project and run the project in a new environment after kedro package command?

I have used already built pipeline using iris data and created a wheel and egg file using "kedro package". After this I created a virtual environment using python and installed both wheel and egg files there. I tried to run the pipeline file from…
Harish
  • 21
  • 1
2
votes
1 answer

Kedro - how to pass nested parameters directly to node

kedro recommends storing parameters in conf/base/parameters.yml. Let's assume it looks like this: step_size: 1 model_params: learning_rate: 0.01 test_data_ratio: 0.2 num_train_steps: 10000 And now imagine I have some data_engineering…
2
votes
1 answer

Loading data using sparkJDBCDataset with jars not working

When using a sparkJDBCDataset to load a table using a JDBC connection, I keep running into the error that spark cannot find my driver. The driver definitely exists on the machine and it's directory is specified inside the spark.yml file under…
Weiyi Yin
  • 60
  • 4
2
votes
1 answer

Convert csv into parquet in kedro

I have pretty big CSV that would not fit into memory, and I need to convert it into .parquet file to work with vaex. Here is my catalog: raw_data: type: kedro.contrib.io.pyspark.SparkDataSet filepath: data/01_raw/data.csv file_format:…
eawer
  • 1,265
  • 2
  • 12
  • 20
2
votes
1 answer

Setting parameters in Kedro Notebook

Is it possible to overwrite properties taken from the parameters.yaml file within a Kedro notebook? I am trying to dynamically change parameter values within a notebook. I would like to be able to give users the ability to run a standard pipeline…
DHollett
  • 23
  • 2
2
votes
3 answers

Kedro deployment to databricks

Maybe I misunderstand the purpose of packaging but it doesn't seem to helpful in creating an artifact for production deployment because it only packages code. It leaves out the conf, data, and other directories that make the kedro project…
dres
  • 1,053
  • 8
  • 13
2
votes
1 answer

How to use Kedro with Pipenv?

I am currently using kedro, version 0.15.4 with pipenv, version 2018.11.26. At the moment, I have to do the following if I want to use Pipenv (For this example, I want this project to reside in the kedro-pipenv directory): mkdir kedro-pipenv && cd…
jayBana
  • 325
  • 2
  • 9
2
votes
1 answer

Running pipelines with data parallellization

I've been running the kedro tutorials (the hello world and the spaceflight) and I'm wondering if it's easily possible to do data parallelization using Kedro. Imagine, the situation where I have a node that needs to be executed in millions of…
2
votes
1 answer

Kedro: How to pass multiple same data from a directory as a node input?

I have a directory with multiple files for the same data format (1 file per day). It's like one data split into multiple files. Is it possible to pass all the files to A Kedro node without specifying each file? So they all get processed…
921Kiyo
  • 512
  • 3
  • 9
2
votes
1 answer

Are S3 Kedro datasets thread-safe?

CSVS3DataSet/HDFS3DataSet/HDFS3DataSet use boto3, which is known to be not thread-safe https://boto3.amazonaws.com/v1/documentation/api/latest/guide/resources.html?highlight=multithreading#multithreading-multiprocessing Is it OK to use these…