Kedro is an open source Python library that helps you build production-ready data and analytics pipelines
Questions tagged [kedro]
90 questions
0
votes
0 answers
Kedro-mlflow usage - when to use it from notebooks, and when from kedro pipeline?
I'm a bit confused - what is the common practice for kedro-mlflow usage? It's seems slightly uncomfortable to use it only from kedro pipelines, but kedro intention is fully reproducible research.
At the same time rather rare tutorials on…
0
votes
1 answer
How to use Chunk Size for kedro.extras.datasets.pandas.SQLTableDataSet in the kedro pipeline?
I am using kedro.extras.datasets.pandas.SQLTableDataSet and would like to use the chunk_size argument from pandas. However, when running the pipeline, the table gets treated as a generator instead of a pd.dataframe().
How would you use the…
![](../../users/profiles/9240798.webp)
Jacob Weiss
- 41
- 4
0
votes
1 answer
Failed while loading data from data set SQLQueryDataSet
I am receiving this error:
DataSetError: Failed while loading data from data set SQLQueryDataSet(load_args={}, sql=select * from table)
when I run (within kedro jupyter…
0
votes
1 answer
Adding stream_results=True (execution_options) to kedro.extras.datasets.pandas.SQLQueryDataSet
Is it possible to add execution_options to kedro.extras.datasets.pandas.SQLQueryDataSet?
For example, I would like to add stream_results=True to the connection string.
engine = create_engine(
"postgresql://postgres:pass@localhost/example"
)
conn =…
0
votes
0 answers
Kedro: Save logging messages by namespace in the pipeline
Intro
I am working on a project where I have several different target variables and we utilize the same modeling framework in Kedro to peg a pipeline to each of the target variables. Each pipeline is defined with its own namespace. I have a…
![](../../users/profiles/10267226.webp)
tabris
- 1
0
votes
2 answers
Parquet file larger than memory consumption of pandas DataFrame
I am storing two different pandas DataFrames as parquet files (through kedro).
Both DataFrames have identical dimensions and dtypes (float32) before getting written to disk. Also, their memory consumption in RAM is…
![](../../users/profiles/791795.webp)
Nils Blum-Oeste
- 5,588
- 4
- 21
- 25
0
votes
2 answers
Kedro Conditional Pipes (or alternatives)
I am currently examining different design pattern options for our pipelines. Kedro framework seems like a good option (allowing to modular design pattern, visualization methods, etc.).
The pipelines should be created out of many modules that are…
![](../../users/profiles/2599301.webp)
Jumpman
- 35
- 5
0
votes
3 answers
What does this python function signature means in Kedro Tutorial?
I am looking at Kedro Library as my team are looking into using it for our data pipeline.
While going to the offical tutorial - Spaceflight.
I came across this function:
def preprocess_companies(companies: pd.DataFrame) ->…
![](../../users/profiles/8557751.webp)
Kevin Seek
- 5
- 3
0
votes
1 answer
TemplatedConfigLoader in register_config_loader not replacing patterns in catalog.yml (kedro)
I am using kedro to manage some data, for which I have a number of dataset CSVs in the same location. As described here, I should be able to store the filepath to this location in a globals.yml file, and use the ${...} syntax in my catalog, but I…
0
votes
0 answers
Kedro 0.17 Override global.yml with extra params
Im currently not able to update the globals.yml file with extra params passed at run time as I previously did with Kedro 0.16.x. I run kedro through run.py.
@hook_impl
def register_config_loader(self, conf_paths: Iterable[str]) ->…
![](../../users/profiles/15124797.webp)
Vinay V
- 1
0
votes
1 answer
SQLAlchemy Oracle - InvalidRequestError: could not retrieve isolation level
I am having problems accessing tables in an Oracle database over a SQLAlchemy connection. Specifically, I am using Kedro catalog.load('table_name') and getting the error message Table table_name not found. So I decided to test my connection using…
![](../../users/profiles/7822853.webp)
Pierre Delecto
- 342
- 1
- 3
- 19
0
votes
1 answer
Parallelism for Entire Kedro Pipeline
I am working on a project where we are processing very large images. The pipeline has several nodes, where each produces output necessary for the next node to run. My understanding is that the ParallelRunner is running the nodes in parallel. It is…
0
votes
2 answers
Kedro install - Cannot uninstall `terminado`
When running kedro install I get the following error:
Attempting uninstall: terminado
Found existing installation: terminado 0.8.3
ERROR: Cannot uninstall 'terminado'. It is a distutils installed project and thus we cannot accurately determine…
![](../../users/profiles/1494511.webp)
zeh
- 765
- 1
- 7
- 24
0
votes
1 answer
Specify Kedro data version within DataCatalog?
Is it possible to define data version with Kedro
type: pandas.CSVDataSet
filepath: data/01_raw/company/cars.csv
versioned: True
load_version: $USER_DEFINED_VERSION # Wanted to do this
Currently, Kedro supports using a CLI to specify load…
![](../../users/profiles/9957897.webp)
mediumnok
- 101
- 1
- 6
0
votes
1 answer
How do I reproduce experiments or specify the nodes execution order in Kedro?
Since kedro determines the execution graph based on the nodes input/outputs, the order of executions is non-deterministic. It can vary between runs.
Even when I set a seed I may sample different data in different runs.
Let says I have 3 nodes that…
![](../../users/profiles/9957897.webp)
mediumnok
- 101
- 1
- 6