Questions tagged [luigi]

Luigi is a Python package that helps you build complex pipelines of batch jobs.

Luigi is a Python package that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization, handling failures, command line integration, and much more.

For further information, see the documentation at luigi.readthedocs.io.

Getting Luigi

Run pip install luigi to install the latest stable version from PyPI.

For bleeding edge code, git clone https://github.com/spotify/luigi and python setup.py install. Bleeding edge documentation can be found here.

If you want to run the central scheduler (highly recommended), you need to install Tornado which you can do from PyPI as well: pip install tornado.

316 questions
36
votes
2 answers

Python based asynchronous workflow modules : What is difference between celery workflow and luigi workflow?

I am using django as a web framework. I need a workflow engine that can do synchronous as well as asynchronous(batch tasks) chain of tasks. I found celery and luigi as batch processing workflow. My first question is what is the difference between…
user3343061
  • 458
  • 4
  • 7
15
votes
3 answers

Luigi - Unfulfilled %s at run time

I am trying to learn in a very simple way how luigi works. Just as a newbie I came up with this code import luigi class class1(luigi.Task): def requires(self): return class2() def output(self): return…
Fact
  • 1,308
  • 12
  • 24
14
votes
4 answers

Can luigi rerun tasks when the task dependencies become out of date?

As far as I know, a luigi.Target can either exist, or not. Therefore, if a luigi.Target exists, it wouldn't be recomputed. I'm looking for a way to force recomputation of the task, if one of its dependencies is modified, or if the code of one of the…
Ophir Yoktan
  • 7,130
  • 6
  • 51
  • 92
13
votes
1 answer

Luigi Pipeline beginning in S3

My initial files are in AWS S3. Could someone point me how I need to setup this in a Luigi Task? I reviewed the documentation and found luigi.S3 but is not clear for me what to do with that, then I searched in the web and only get links from…
nanounanue
  • 6,692
  • 7
  • 36
  • 65
12
votes
0 answers

Is it possible to write a luigi wrapper task that tolerates failed sub-tasks?

I have a luigi task that performs some non-stable computations. Think of an optimization process that sometimes does not converge. import luigi MyOptimizer(luigi.Task): input_param: luigi.Parameter() output_filename =…
DalyaG
  • 2,067
  • 2
  • 12
  • 14
12
votes
2 answers

Luigi LocalTarget binary file

I am having troubles to write a binary LocalTarget in a Luigi pipeline in my project. I isolated the problem here: class LuigiTest(luigi.Task): def output(self): return luigi.LocalTarget('test.npz') def run(self): with…
Álvaro Marco
  • 1,889
  • 14
  • 28
12
votes
5 answers

How to reset luigi task status?

Currently, I have a bunch of luigi tasks queued together, with a simple dependency chain( a -> b -> c -> d). d gets executed first, and a at the end. a is the task that gets triggered. All the targets except a return a luigi.LocalTarget() object…
HackToHell
  • 1,701
  • 4
  • 21
  • 43
11
votes
1 answer

Implementing luigi dynamic graph configuration

I am new to luigi, came across it while designing a pipeline for our ML efforts. Though it wasn't fitted to my particular use case it had so many extra features I decided to make it fit. Basically what I was looking for was a way to be able to…
Veltzer Doron
  • 843
  • 1
  • 10
  • 28
11
votes
2 answers

Passing Python objects between Tasks in Luigi?

I was coding my first project in Python 3.6 using Spotify's Luigi to arrange some Natural Language Processing Tasks in a pipeline. I noticed that the output() function of a Task class always returns some kind of Target object, which is just some…
Kaleidophon
  • 501
  • 5
  • 14
10
votes
4 answers

Running Luigi task from cmd - "No module named tasks"

I am having problems running a Luigi task through the Windows cmd. Here are the facts: Running Anaconda installed in C:\ProgramData\Anaconda2 (Python 2.7) Anaconda has added its paths to the PATH variable but there is no PYTHONPATH variable …
9
votes
1 answer

Airflow: how to specify quantitative usage of a resource pool?

I am looking at several open source workflow schedulers for a DAG of jobs with heterogeneous RAM usage. The scheduler should not only schedule less than a maximum number of threads, but should also keep the total amount of RAM of all concurrent…
TemplateRex
  • 65,583
  • 16
  • 147
  • 283
9
votes
2 answers

How to log from Python Luigi

I am trying to build a strategy to log from Luigi in such a way that there is a configurable list of outputs including stdout and a custom list of files. I would like to be able to set the logging level at runtime. Our system uses Luigi to call…
sakurashinken
  • 2,851
  • 5
  • 23
  • 56
9
votes
1 answer

parallelizing tasks in Luigi Orchestrator

I defined three tasks T1, T2, and T3, and then a task T4 as follows: class T4(luigi.Task) def requires(self): return [T1(), T2(), T3()] Is there a natural way to tell Luigi that I want these tasks T1, T2, and T3 to be executed in…
sweeeeeet
  • 1,599
  • 2
  • 19
  • 45
8
votes
2 answers

Where is the luigi config file?

I have installed luigi by pip command and I would like to change the port for the web UI. I tried to find the config file but I couldn't. Do I need to create one?
Iamasupernoob
  • 83
  • 1
  • 4
8
votes
2 answers

Can Luigi propagate exception or return any result?

I am using Luigi to launch some pipeline. Let's take a simple exemple task = myTask() w = Worker(scheduler=CentralPlannerScheduler(), worker_processes=1) w.add(task) w.run() Now let's say that myTask is raising an exception during execution. All…
MathiasDesch
  • 332
  • 2
  • 14
1
2 3
21 22