Highest Voted 'kedro' Questions

0

votes

0 answers

Kedro-mlflow usage - when to use it from notebooks, and when from kedro pipeline?

I'm a bit confused - what is the common practice for kedro-mlflow usage? It's seems slightly uncomfortable to use it only from kedro pipelines, but kedro intention is fully reproducible research. At the same time rather rare tutorials on…

mlflow kedro

asked May 15 '21 at 12:32

Andrey Bondarenko

1
2

0

votes

1 answer

How to use Chunk Size for kedro.extras.datasets.pandas.SQLTableDataSet in the kedro pipeline?

I am using kedro.extras.datasets.pandas.SQLTableDataSet and would like to use the chunk_size argument from pandas. However, when running the pipeline, the table gets treated as a generator instead of a pd.dataframe(). How would you use the…

kedro

asked May 13 '21 at 15:24

Jacob Weiss

41
4

0

votes

1 answer

Failed while loading data from data set SQLQueryDataSet

I am receiving this error: DataSetError: Failed while loading data from data set SQLQueryDataSet(load_args={}, sql=select * from table) when I run (within kedro jupyter…

kedro

asked Apr 21 '21 at 19:42

Jacob Weiss

1

0

votes

1 answer

Adding stream_results=True (execution_options) to kedro.extras.datasets.pandas.SQLQueryDataSet

Is it possible to add execution_options to kedro.extras.datasets.pandas.SQLQueryDataSet? For example, I would like to add stream_results=True to the connection string. engine = create_engine( "postgresql://postgres:pass@localhost/example" ) conn =…

pandas sqlalchemy kedro

asked Apr 20 '21 at 15:58

Jacob Weiss

1

0

votes

0 answers

Kedro: Save logging messages by namespace in the pipeline

Intro I am working on a project where I have several different target variables and we utilize the same modeling framework in Kedro to peg a pipeline to each of the target variables. Each pipeline is defined with its own namespace. I have a…

logging namespaces kedro

asked Apr 05 '21 at 23:20

tabris

1

0

votes

2 answers

Parquet file larger than memory consumption of pandas DataFrame

I am storing two different pandas DataFrames as parquet files (through kedro). Both DataFrames have identical dimensions and dtypes (float32) before getting written to disk. Also, their memory consumption in RAM is…

python pandas parquet kedro

asked Mar 16 '21 at 09:03

Nils Blum-Oeste

5,588
4
21
25

0

votes

2 answers

Kedro Conditional Pipes (or alternatives)

I am currently examining different design pattern options for our pipelines. Kedro framework seems like a good option (allowing to modular design pattern, visualization methods, etc.). The pipelines should be created out of many modules that are…

python design-patterns pipe pipeline kedro

asked Feb 22 '21 at 09:15

Jumpman

35
5

0

votes

3 answers

What does this python function signature means in Kedro Tutorial?

I am looking at Kedro Library as my team are looking into using it for our data pipeline. While going to the offical tutorial - Spaceflight. I came across this function: def preprocess_companies(companies: pd.DataFrame) ->…

python-3.x kedro

asked Feb 11 '21 at 16:19

Kevin Seek

5
3

0

votes

1 answer

TemplatedConfigLoader in register_config_loader not replacing patterns in catalog.yml (kedro)

I am using kedro to manage some data, for which I have a number of dataset CSVs in the same location. As described here, I should be able to store the filepath to this location in a globals.yml file, and use the ${...} syntax in my catalog, but I…

python config hook catalog kedro

asked Feb 04 '21 at 15:17

Michael Cole

1

0

votes

0 answers

Kedro 0.17 Override global.yml with extra params

Im currently not able to update the globals.yml file with extra params passed at run time as I previously did with Kedro 0.16.x. I run kedro through run.py. @hook_impl def register_config_loader(self, conf_paths: Iterable[str]) ->…

kedro

asked Feb 01 '21 at 18:08

Vinay V

1

0

votes

1 answer

SQLAlchemy Oracle - InvalidRequestError: could not retrieve isolation level

I am having problems accessing tables in an Oracle database over a SQLAlchemy connection. Specifically, I am using Kedro catalog.load('table_name') and getting the error message Table table_name not found. So I decided to test my connection using…

python oracle sqlalchemy kedro

asked Jan 21 '21 at 15:26

Pierre Delecto

342
1
3
19

0

votes

1 answer

Parallelism for Entire Kedro Pipeline

I am working on a project where we are processing very large images. The pipeline has several nodes, where each produces output necessary for the next node to run. My understanding is that the ParallelRunner is running the nodes in parallel. It is…

kedro

asked Jan 06 '21 at 15:29

Brian Falkenstein

1

0

votes

2 answers

Kedro install - Cannot uninstall `terminado`

When running kedro install I get the following error: Attempting uninstall: terminado Found existing installation: terminado 0.8.3 ERROR: Cannot uninstall 'terminado'. It is a distutils installed project and thus we cannot accurately determine…

python pip kedro

asked Nov 21 '20 at 05:22

zeh

765
1
7
24

0

votes

1 answer

Specify Kedro data version within DataCatalog?

Is it possible to define data version with Kedro type: pandas.CSVDataSet filepath: data/01_raw/company/cars.csv versioned: True load_version: $USER_DEFINED_VERSION # Wanted to do this Currently, Kedro supports using a CLI to specify load…

kedro

asked Nov 17 '20 at 09:42

mediumnok

101
1
6

0

votes

1 answer

How do I reproduce experiments or specify the nodes execution order in Kedro?

Since kedro determines the execution graph based on the nodes input/outputs, the order of executions is non-deterministic. It can vary between runs. Even when I set a seed I may sample different data in different runs. Let says I have 3 nodes that…

kedro

asked Nov 06 '20 at 08:55

mediumnok

101
1
6

Questions tagged [kedro]