I have a luigi task that performs some non-stable computations. Think of an optimization process that sometimes does not converge.
import luigi
MyOptimizer(luigi.Task):
input_param: luigi.Parameter()
output_filename = luigi.Parameter(default='result.json')
def run(self):
optimize_something(self.input_param, self.output().path)
def output(self):
return luigi.LocalTarget(self.output_filename)
Now I would like to build a wrapper task that will run this optimizer several times, with different input parameters, and will take the output of the first run that converged.
The way I am implementing it now is by not using MyOptimizer
because if it fails, luigi will think that also the wrapper task has failed, but I am okay with some instances of MyOptimizer
failing.
MyWrapper(luigi.Task):
input_params_list = luigi.ListParameter()
output_filename = luigi.Parameter(default='result.json')
def run(self):
for input_param in self.input_params_list:
try:
optimize_something(self.input_param, self.output().path)
print(f"Optimizer succeeded with input {input_param}")
break
except Exception as e:
print(f"Optimizer failed with input {input_param}. Trying again...")
def output(self):
return luigi.LocalTarget(self.output_filename)
The problem is that this way, the tasks are not parallelized. Also, you can imagine MyOptimizer
and optimize_something
are complex tasks that also participate in the data-pipeline handled by luigi, which creates pretty much chaos in my code.
I would appreciate any insights and ideas on how to make this work in a luigi-like fashion :)