Highest Voted 'kedro' Questions

2

votes

1 answer

Does kedro support tfrecord?

To train tensorflow keras models on AI Platform using Docker containers, we convert our raw images stored on GCS to a tfrecord dataset using tf.data.Dataset. Thereby the data is never stored locally. Instead the raw images are transformed directly…

asked Jul 30 '20 at 22:41

evolved

1,071
12
28

2

votes

1 answer

Does Kedro support Checkpointing/Caching of Results?

Let's say we have multiple long running pipeline nodes. It seems quite straight forward to checkpoint or cache the intermediate results, so when nodes after a checkpoint are changed or added only these nodes must be executed again. Does Kedro…

kedro

asked Jun 05 '20 at 12:48

Sir ExecLP

83
1
5

2

votes

1 answer

Passing nested parameters in the extra_params of the load_context in Kedro

I am trying to load a Kedro context with some extra parameters. My intention is to update the configs in parameters.yml with only the ones passed in extra_params (so rest of the configs should remain same). I will then use this instance of context…

python kedro

asked Jun 05 '20 at 12:14

Mohit

985
3
16
40

2

votes

2 answers

Is there IO functionality to store trained models in kedro?

In the IO section of the kedro API docs I could not find functionality w.r.t. storing trained models (e.g. .pkl, .joblib, ONNX, PMML)? Have I missed something?

python kedro

asked May 11 '20 at 19:19

thinwybk

2,493
16
40

2

votes

1 answer

How do I add many CSV files to the catalog in Kedro?

I have hundreds of CSV files that I want to process similarly. For simplicity, we can assume that they are all in ./data/01_raw/ (like ./data/01_raw/1.csv, ./data/02_raw/2.csv) etc. I would much rather not give each file a different name and keep…

python kedro

asked May 06 '20 at 21:05

Srikiran

165
1
2
7

2

votes

1 answer

how to deploy the kedro project and run the project in a new environment after kedro package command?

I have used already built pipeline using iris data and created a wheel and egg file using "kedro package". After this I created a virtual environment using python and installed both wheel and egg files there. I tried to run the pipeline file from…

kedro

asked May 04 '20 at 08:53

Harish

21
1

2

votes

1 answer

Kedro - how to pass nested parameters directly to node

kedro recommends storing parameters in conf/base/parameters.yml. Let's assume it looks like this: step_size: 1 model_params: learning_rate: 0.01 test_data_ratio: 0.2 num_train_steps: 10000 And now imagine I have some data_engineering…

machine-learning yaml pipeline kedro

asked Apr 27 '20 at 05:24

Mark Fingerhuth

143
5

2

votes

1 answer

Loading data using sparkJDBCDataset with jars not working

When using a sparkJDBCDataset to load a table using a JDBC connection, I keep running into the error that spark cannot find my driver. The driver definitely exists on the machine and it's directory is specified inside the spark.yml file under…

pyspark spark-jdbc kedro

asked Mar 18 '20 at 16:28

Weiyi Yin

60
4

2

votes

1 answer

Convert csv into parquet in kedro

I have pretty big CSV that would not fit into memory, and I need to convert it into .parquet file to work with vaex. Here is my catalog: raw_data: type: kedro.contrib.io.pyspark.SparkDataSet filepath: data/01_raw/data.csv file_format:…

python kedro

asked Feb 24 '20 at 19:29

eawer

1,265
2
12
20

2

votes

1 answer

Setting parameters in Kedro Notebook

Is it possible to overwrite properties taken from the parameters.yaml file within a Kedro notebook? I am trying to dynamically change parameter values within a notebook. I would like to be able to give users the ability to run a standard pipeline…

python kedro

asked Feb 19 '20 at 11:35

DHollett

23
2

2

votes

3 answers

Kedro deployment to databricks

Maybe I misunderstand the purpose of packaging but it doesn't seem to helpful in creating an artifact for production deployment because it only packages code. It leaves out the conf, data, and other directories that make the kedro project…

kedro

asked Jan 20 '20 at 19:20

dres

1,053
8
13

2

votes

1 answer

How to use Kedro with Pipenv?

I am currently using kedro, version 0.15.4 with pipenv, version 2018.11.26. At the moment, I have to do the following if I want to use Pipenv (For this example, I want this project to reside in the kedro-pipenv directory): mkdir kedro-pipenv && cd…

python pipenv kedro

asked Nov 25 '19 at 17:28

jayBana

325
2
9

2

votes

1 answer

Running pipelines with data parallellization

I've been running the kedro tutorials (the hello world and the spaceflight) and I'm wondering if it's easily possible to do data parallelization using Kedro. Imagine, the situation where I have a node that needs to be executed in millions of…

kedro

asked Nov 19 '19 at 16:26

Tiago Freitas Pereira

653
1
7
10

2

votes

1 answer

Kedro: How to pass multiple same data from a directory as a node input?

I have a directory with multiple files for the same data format (1 file per day). It's like one data split into multiple files. Is it possible to pass all the files to A Kedro node without specifying each file? So they all get processed…

python kedro

asked Nov 19 '19 at 09:52

921Kiyo

512
3
9

2

votes

1 answer

Are S3 Kedro datasets thread-safe?

CSVS3DataSet/HDFS3DataSet/HDFS3DataSet use boto3, which is known to be not thread-safe https://boto3.amazonaws.com/v1/documentation/api/latest/guide/resources.html?highlight=multithreading#multithreading-multiprocessing Is it OK to use these…

python amazon-s3 boto3 kedro

asked Nov 18 '19 at 09:26

Anton Kirilenko

117
5

Questions tagged [kedro]