4

I have a Luigi task that requires a subtask. The subtask depends on parameters passed through by the parent task (i.e. the one that is doing the requireing). I know you can specify a parameter that the subtask can use by setting...

def requires(self):
    return subTask(some_parameter)

...then on the subtask, receiving the parameter by setting...

x = luigi.Parameter()

That only appears to let you pass through one parameter though. What is the best way to send through an arbitrary number of parameters, of whatever types I want? Really I want something like this:

class parentTask(luigi.Task):

    def requires(self):
        return subTask({'file_postfix': 'foo',
                        'file_content': 'bar'
        })

    def run(self):
        return


class subTask(luigi.Task):
    params = luigi.DictParameter()

    def output(self):
        return luigi.LocalTarget("file_{}.csv".format(self.params['file_postfix']))

    def run(self):
        with self.output().open('w') as f:
            f.write(self.params['file_content'])

As you can see I tried using luigi.DictParameter instead of a straight luigi.Parameter but I get TypeError: unhashable type: 'dict' from somewhere deep inside Luigi when I run the above.

Running Python 2.7.11, Luigi 2.1.1

guzman
  • 167
  • 2
  • 11

3 Answers3

4

What is the best way to send through an arbitrary number of parameters, of whatever types I want?

The best way is to use named parameters, e.g.,

#in requires
return MySampleSubTask(x=local_x, y=local_y)

class MySampleSubTask(luigi.Task):
    x = luigi.Parameter()
    y = luigi.Parameter()
MattMcKnight
  • 7,915
  • 26
  • 34
2

What is the best way to send through an arbitrary number of parameters, of whatever types I want?

you could follow this example. you will have one placeholder to define all parameters needed to pass(ParameterCollector). This will avoid defining parameters on every sincgle task if you need to pass a paramter to subtasks in the case of many subtasks.

class ParameterCollector(object):
    param1 = luigi.Parameter()
    param2 = luigi.Parameter()

    def collect_params(self):
        return {'param1': self.param1, 'param2': self.param2}


class TaskB(ParameterCollector, luigi.Task):
    def requires(self):
        return []

    def output(self):
        return luigi.LocalTarget('/tmp/task1_success')

    def run(self):
        with self.output().open('w') as f:
            f.write(self.param1)


class TaskA(ParameterCollector, luigi.Task):
    def requires(self):
        a = TaskB(**self.collect_params())
        print(a)
        return a

    def output(self):
        return luigi.LocalTarget('/tmp/task2_success')

    def run(self):
        with self.output().open('w') as f:
            f.write(str([self.param1, self.param2]))


if __name__ == '__main__':
    luigi.run()
hamed
  • 1,115
  • 1
  • 14
  • 15
0

Ok, so I found that this works as expected in python 3.5 (and the problem is still there in 3.4).

Don't have the time to get to the bottom of this today so no further details.

guzman
  • 167
  • 2
  • 11