Kedro is an open source Python library that helps you build production-ready data and analytics pipelines
Questions tagged [kedro]
90 questions
2
votes
1 answer
Does kedro support tfrecord?
To train tensorflow keras models on AI Platform using Docker containers, we convert our raw images stored on GCS to a tfrecord dataset using tf.data.Dataset. Thereby the data is never stored locally. Instead the raw images are transformed directly…
![](../../users/profiles/2137370.webp)
evolved
- 1,071
- 12
- 28
2
votes
1 answer
Does Kedro support Checkpointing/Caching of Results?
Let's say we have multiple long running pipeline nodes.
It seems quite straight forward to checkpoint or cache the intermediate results, so when nodes after a checkpoint are changed or added only these nodes must be executed again.
Does Kedro…
![](../../users/profiles/7997186.webp)
Sir ExecLP
- 83
- 1
- 5
2
votes
1 answer
Passing nested parameters in the extra_params of the load_context in Kedro
I am trying to load a Kedro context with some extra parameters. My intention is to update the configs in parameters.yml with only the ones passed in extra_params (so rest of the configs should remain same). I will then use this instance of context…
![](../../users/profiles/1421381.webp)
Mohit
- 985
- 3
- 16
- 40
2
votes
2 answers
Is there IO functionality to store trained models in kedro?
In the IO section of the kedro API docs I could not find functionality w.r.t. storing trained models (e.g. .pkl, .joblib, ONNX, PMML)? Have I missed something?
![](../../users/profiles/5308983.webp)
thinwybk
- 2,493
- 16
- 40
2
votes
1 answer
How do I add many CSV files to the catalog in Kedro?
I have hundreds of CSV files that I want to process similarly. For simplicity, we can assume that they are all in ./data/01_raw/ (like ./data/01_raw/1.csv, ./data/02_raw/2.csv) etc. I would much rather not give each file a different name and keep…
![](../../users/profiles/5812408.webp)
Srikiran
- 165
- 1
- 2
- 7
2
votes
1 answer
how to deploy the kedro project and run the project in a new environment after kedro package command?
I have used already built pipeline using iris data and created a wheel and egg file using "kedro package". After this I created a virtual environment using python and installed both wheel and egg files there. I tried to run the pipeline file from…
![](../../users/profiles/5102389.webp)
Harish
- 21
- 1
2
votes
1 answer
Kedro - how to pass nested parameters directly to node
kedro recommends storing parameters in conf/base/parameters.yml. Let's assume it looks like this:
step_size: 1
model_params:
learning_rate: 0.01
test_data_ratio: 0.2
num_train_steps: 10000
And now imagine I have some data_engineering…
![](../../users/profiles/9732154.webp)
Mark Fingerhuth
- 143
- 5
2
votes
1 answer
Loading data using sparkJDBCDataset with jars not working
When using a sparkJDBCDataset to load a table using a JDBC connection, I keep running into the error that spark cannot find my driver. The driver definitely exists on the machine and it's directory is specified inside the spark.yml file under…
![](../../users/profiles/9115709.webp)
Weiyi Yin
- 60
- 4
2
votes
1 answer
Convert csv into parquet in kedro
I have pretty big CSV that would not fit into memory, and I need to convert it into .parquet file to work with vaex.
Here is my catalog:
raw_data:
type: kedro.contrib.io.pyspark.SparkDataSet
filepath: data/01_raw/data.csv
file_format:…
![](../../users/profiles/1110044.webp)
eawer
- 1,265
- 2
- 12
- 20
2
votes
1 answer
Setting parameters in Kedro Notebook
Is it possible to overwrite properties taken from the parameters.yaml file within a Kedro notebook?
I am trying to dynamically change parameter values within a notebook. I would like to be able to give users the ability to run a standard pipeline…
![](../../users/profiles/12925311.webp)
DHollett
- 23
- 2
2
votes
3 answers
Kedro deployment to databricks
Maybe I misunderstand the purpose of packaging but it doesn't seem to helpful in creating an artifact for production deployment because it only packages code. It leaves out the conf, data, and other directories that make the kedro project…
![](../../users/profiles/442837.webp)
dres
- 1,053
- 8
- 13
2
votes
1 answer
How to use Kedro with Pipenv?
I am currently using kedro, version 0.15.4 with pipenv, version 2018.11.26.
At the moment, I have to do the following if I want to use Pipenv (For this example, I want this project to reside in the kedro-pipenv directory):
mkdir kedro-pipenv && cd…
![](../../users/profiles/2969315.webp)
jayBana
- 325
- 2
- 9
2
votes
1 answer
Running pipelines with data parallellization
I've been running the kedro tutorials (the hello world and the spaceflight) and I'm wondering if it's easily possible to do data parallelization using Kedro.
Imagine, the situation where I have a node that needs to be executed in millions of…
![](../../users/profiles/5913101.webp)
Tiago Freitas Pereira
- 653
- 1
- 7
- 10
2
votes
1 answer
Kedro: How to pass multiple same data from a directory as a node input?
I have a directory with multiple files for the same data format (1 file per day). It's like one data split into multiple files.
Is it possible to pass all the files to A Kedro node without specifying each file? So they all get processed…
![](../../users/profiles/3858528.webp)
921Kiyo
- 512
- 3
- 9
2
votes
1 answer
Are S3 Kedro datasets thread-safe?
CSVS3DataSet/HDFS3DataSet/HDFS3DataSet use boto3, which is known to be not thread-safe https://boto3.amazonaws.com/v1/documentation/api/latest/guide/resources.html?highlight=multithreading#multithreading-multiprocessing
Is it OK to use these…
![](../../users/profiles/3437494.webp)
Anton Kirilenko
- 117
- 5