Questions tagged [luigi]

Luigi is a Python package that helps you build complex pipelines of batch jobs.

Luigi is a Python package that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization, handling failures, command line integration, and much more.

For further information, see the documentation at luigi.readthedocs.io.

Getting Luigi

Run pip install luigi to install the latest stable version from PyPI.

For bleeding edge code, git clone https://github.com/spotify/luigi and python setup.py install. Bleeding edge documentation can be found here.

If you want to run the central scheduler (highly recommended), you need to install Tornado which you can do from PyPI as well: pip install tornado.

316 questions
5
votes
2 answers

Organizing files when using Luigi pipeline?

I am using Luigi for my workflow. My workflow is divided into three general parts - import, analysis, export. Within each part, there are multiple Luigi tasks. I could have everything in a single file. But if I want to keep everything separate, as…
5
votes
1 answer

Python Luigi - Continue with External task when satisfied

I am working on a Luigi pipeline that checks if a manually created file exists and if so, continues with the next tasks: import luigi, os class ExternalFileChecker(luigi.ExternalTask): task_namespace='MyTask' path = luigi.Parameter() …
Johan
  • 406
  • 5
  • 19
5
votes
1 answer

How can I get my Luigi scheduler to utilize multiple cores with the parallel-scheduling flag?

I have the following line in my luigi.cfg file (on all nodes, scheduler and workers): [core] parallel-scheduling: true However, when I monitor CPU utilization on my luigi scheduler (with a graph of around ~4000 tasks, handling requests from ~100…
captaincapsaicin
  • 762
  • 1
  • 5
  • 15
5
votes
1 answer

How to avoid running a specific task simultaneously in Luigi with multiple workers

I use Luigi to build data analysis tasks including plotting by matplotlib. It seems concurrent runs of matplotlib plotting causes a problem, which causes returning from the task prematurely, doing nothing, for some reason. (Looks like this is the…
Hiro
  • 475
  • 4
  • 9
5
votes
1 answer

What's a resource in Luigi Python?

In the web interface and in https://github.com/spotify/luigi/blob/master/luigi/task.py I can see that a Task can have "resources". There is also a placeholder function in a Task class called process_resources(), that just returns the empty…
Peter Smit
  • 1,374
  • 11
  • 21
4
votes
0 answers

Using the Jaeger Python client together with Luigi

I'm just starting to use Jaeger for tracing and want to get the Python client to work with Luigi. The root of the problem is, that Luigi uses multiprocessing to fork worker processes. The docs mention that this can cause problems and recommend - in…
Achim
  • 14,333
  • 13
  • 70
  • 128
4
votes
0 answers

Using Luigi, how to read PostgreSQL data and then pass such data to the next task in the workflow?

Using Luigi, I want to define a workflow with two "stages": The first one reads data from PostgreSQL. The second one does something with the data. Thus I've started by subclassing luigi.contrib.postgres.PostgresQuery and overriding host, database,…
frb
  • 3,592
  • 2
  • 17
  • 48
4
votes
0 answers

What is the purpose of significant parameter in Luigi?

The documentation says: If a parameter is created with significant=False, it is ignored as far as the Task signature is concerned. Tasks created with only insignificant parameters differing have the same signature but are not the same instance.…
Yankee
  • 1,680
  • 2
  • 20
  • 40
4
votes
3 answers

Luigi Pipelining : No module named pwd in Windows

I am trying to execute the tutorial given in https://marcobonzanini.com/2015/10/24/building-data-pipelines-with-python-and-luigi/. I am able to run the program on its own using local scheduler, giving me: Scheduled 2 tasks of which: * 2 ran…
ALEX MATHEW
  • 201
  • 3
  • 11
4
votes
2 answers

Job Scheduler - YAML for writing job definition?

In our legacy job scheduling software (built on top of crontab), we are using apache config format (parser) for writing a job definition and we are using perl config general to parse the config files. This software is highly customized and have …
Lokesh Agrawal
  • 3,269
  • 6
  • 28
  • 66
4
votes
2 answers

MongoDB in Luigi

I was trying to build a pipeline with luigi. First by getting data from an API, transform and then save it to a mongo db. I'm still new to luigi, my question is how do I implement the output() function which specifies outputs to a mongo db. And how…
Sam
  • 393
  • 6
  • 13
4
votes
1 answer

Luigi task returns unfulfilled dependency at run time when dependency is complete

I am relatively new to creating flows with Luigi and am trying to understand why my small workflow is resulting in an unfulfilled dependency. I am trying to run the task StageProviders(), which has a single dependency ErrorsLogFile(). The tasks that…
Funsaized
  • 1,525
  • 2
  • 12
  • 37
4
votes
2 answers

How to continously update target file using Luigi?

I have recently started playing around with Luigi, and I would like to find out how to use it to continuously append new data into an existing target file. Imagine I am pinging an api every minute to retrieve new data. Because a Task only runs if…
mtoto
  • 21,499
  • 2
  • 49
  • 64
4
votes
1 answer

Using luigi to update Postgres table

I've just started using the luigi library. I am regularly scraping a website and inserting any new records into a Postgres database. As I'm trying to rewrite parts of my scripts to use luigi, it's not clear to me how the "marker table" is supposed…
durrrutti
  • 950
  • 1
  • 7
  • 17
4
votes
1 answer

MySQL Targets in Luigi workflow

My TaskB requires TaskA, and on completion TaskA writes to a MySQL table, and then TaskB is to take in this output to the table as its input. I cannot seem to figure out how to do this in Luigi. Can someone point me to an example or give me a quick…
Rijo Simon
  • 679
  • 2
  • 12
  • 28
1 2
3
21 22