How to change the process count of the ParallelRunner in Kedro?

Question

My pipeline makes a lot of HTTP requests. It’s not a CPU-heavy operation, I’d like to spin more processes than the number of CPU cores. How can I change this?

score 2 · Answer 1 · answered Nov 11 '19 at 09:48

ParallelRunner supports the max_workers parameter, but currently there’s no way to pass it from kedro run cli command. It’s done to reduce the complexity of the CLI. You can add a parameter manually, or just hard-code the value when instantiating the ParallelRunner in kedro_cli.py. The runner part might look like:

runner_class = load_obj(runner, "kedro.runner") if runner else SequentialRunner
runner_params = {'num_workers': 100} if runner is ParallelRunner else {}

context = load_context(Path.cwd(), env=env)
context.run(
    tags=tag,
    runner=runner_class(**runner_params),
    node_names=node_names,
    from_nodes=from_nodes,
    to_nodes=to_nodes,
    from_inputs=from_inputs,
    load_versions=load_version,
    pipeline_name=pipeline,
)

How to change the process count of the ParallelRunner in Kedro?

1 Answers1