Highest Voted 'kedro' Questions

5

votes

1 answer

DataBricks + Kedro Vs GCP + Kubeflow Vs Server + Kedro + Airflow

We are deploying a data consortium between more than 10 companies. Wi will deploy several machine learning models (in general advanced analytics models) for all the companies and we will administrate all the models. We are looking for a solution…

google-cloud-platform databricks kedro

asked Nov 20 '20 at 00:00

Erick Translateur

69
2

5

votes

1 answer

How to process huge datasets in kedro

I have pretty big (~200Gb, ~20M lines) raw jsonl dataset. I need to extract important properties from there and store the intermediate dataset in csv for further conversion into something like HDF5, parquet, etc. Obviously, I can't use JSONDataSet…

python kedro

asked Feb 20 '20 at 22:17

eawer

1,265
2
12
20

4

votes

1 answer

Where to perform the saving of an nodeoutput in Kedro?

In Kedro, we can pipeline different nodes and partially run some nodes. When we are partially running some nodes, we need to save some inputs from the nodes somewhere so that when another node is run it can access the data that the previous node has…

python tensorflow kedro

asked Oct 18 '19 at 04:03

Baenka

141
9

3

votes

2 answers

Override nested parameters using kedro run CLI command

I am using nested parameters in my parameters.yml and would like to override these using runtime parameters for the kedro run CLI command: train: batch_size: 32 train_ratio: 0.9 epochs: 5 The following doesn't seem to work: kedro run…

python command-line-interface kedro

asked Aug 04 '20 at 00:18

evolved

1,071
12
28

3

votes

1 answer

How can I read/write data from/to network attached storage with kedro?

In the API docs about kedro.io and kedro.contrib.io I could not find info about how to read/write data from/to network attached storage such as e.g. FritzBox NAS.

python kedro

asked May 14 '20 at 07:30

thinwybk

2,493
16
40

3

votes

1 answer

How to write a list of dataframes into multiple sheets of ExcelLocalDataSet?

The input is a list of dataframes. How can I save it into an ExcelLocalDataSet where each dataframe is a separate sheet?

kedro

asked Apr 20 '20 at 02:32

James Wong

35
1
6

3

votes

2 answers

Pipeline can't find nodes in kedro

I was following pipelines tutorial, create all needed files, started the kedro with kedro run --node=preprocessing_data but got stuck with such error message: ValueError: Pipeline does not contain nodes named ['preprocessing_data']. If I run kedro…

python kedro

asked Feb 22 '20 at 18:11

eawer

1,265
2
12
20

3

votes

1 answer

Kedro with MongoDB and other document databases?

What's the best practice for using kedro with MongoDB or other document databases? MongoDB, for example, doesn't have a query language analogous to SQL. Most Mongo "queries" in Python (using PyMongo) will look something like this: from pymongo…

python mongodb kedro

asked Nov 19 '19 at 18:12

Benjamin Jack

73
5

3

votes

1 answer

How to convert Spark data frame to Pandas and back in Kedro?

I'm trying to understand what is the optimal way in Kedro to convert Spark dataframe coming out of one node into Pandas required as input for another node without creating a redundant conversion step.

python pandas pyspark kedro

asked Nov 11 '19 at 19:33

Dmitry Deryabin

1,320
11
23

3

votes

1 answer

How to change the process count of the ParallelRunner in Kedro?

My pipeline makes a lot of HTTP requests. It’s not a CPU-heavy operation, I’d like to spin more processes than the number of CPU cores. How can I change this?

python kedro

asked Nov 11 '19 at 09:46

921Kiyo

512
3
9

3

votes

1 answer

How to run the nodes in sequence as declared in kedro pipeline?

In Kedro pipeline, nodes (something like python functions) are declared sequentially. In some cases, the input of one node is the output of the previous node. However, sometimes, when kedro run API is called in the commandline, the nodes are not run…

python machine-learning kedro

asked Nov 04 '19 at 02:38

Baenka

141
9

2

votes

1 answer

Kedro context and catalog missing from Jupyter Notebook

I am able to run my pipelines using the kedro run command without issue. For some reason though I can't access my context and catalog from Jupyter Notebook anymore. When I run kedro jupyter notebook and start a new (or existing) notebook using my…

kedro

asked Feb 02 '21 at 16:54

Pierre Delecto

342
1
3
19

2

votes

1 answer

PartitionedDataSet not found when Kedro pipeline is run in Docker

I have multiple text files in an S3 bucket which I read and process. So, I defined PartitionedDataSet in Kedro datacatalog which looks like this: raw_data: type: PartitionedDataSet path: s3://reads/raw dataset: pandas.CSVDataSet load_args: …

docker kedro

asked Sep 22 '20 at 08:56

mendo

61
5

2

votes

1 answer

How to use tf.data.Dataset with kedro?

I am using tf.data.Dataset to prepare a streaming dataset which is used to train a tf.kears model. With kedro, is there a way to create a node and return the created tf.data.Dataset to use it in the next training node? The MemoryDataset will…

tensorflow pickle tensorflow-datasets kedro tf.data.dataset

asked Sep 03 '20 at 18:59

evolved

1,071
12
28

2

votes

1 answer

How to catalog datasets & models by S3 URI, but keep a local copy?

I'm trying to figure out how to store intermediate Kedro pipeline objects both locally AND on S3. In particular, say I have a dataset on S3: my_big_dataset.hdf5: type: kedro.extras.datasets.pandas.HDFDataSet filepath:…

amazon-s3 caching devops kedro

asked Aug 09 '20 at 21:28

crypdick

4,829
3
31
50

Questions tagged [kedro]