12

I have a luigi task that performs some non-stable computations. Think of an optimization process that sometimes does not converge.

import luigi

MyOptimizer(luigi.Task):
    input_param: luigi.Parameter()
    output_filename = luigi.Parameter(default='result.json')

    def run(self):
        optimize_something(self.input_param, self.output().path)

    def output(self):
        return luigi.LocalTarget(self.output_filename)

Now I would like to build a wrapper task that will run this optimizer several times, with different input parameters, and will take the output of the first run that converged.

The way I am implementing it now is by not using MyOptimizer because if it fails, luigi will think that also the wrapper task has failed, but I am okay with some instances of MyOptimizer failing.

MyWrapper(luigi.Task):
    input_params_list = luigi.ListParameter()
    output_filename = luigi.Parameter(default='result.json')

    def run(self):
        for input_param in self.input_params_list:
            try:
                optimize_something(self.input_param, self.output().path)
                print(f"Optimizer succeeded with input {input_param}")
                break
            except Exception as e:
                print(f"Optimizer failed with input {input_param}. Trying again...")

    def output(self):
        return luigi.LocalTarget(self.output_filename)

The problem is that this way, the tasks are not parallelized. Also, you can imagine MyOptimizer and optimize_something are complex tasks that also participate in the data-pipeline handled by luigi, which creates pretty much chaos in my code.

I would appreciate any insights and ideas on how to make this work in a luigi-like fashion :)

DalyaG
  • 2,067
  • 2
  • 12
  • 14

0 Answers0