Questions tagged [luigi]

Luigi is a Python package that helps you build complex pipelines of batch jobs.

Luigi is a Python package that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization, handling failures, command line integration, and much more.

For further information, see the documentation at luigi.readthedocs.io.

Getting Luigi

Run pip install luigi to install the latest stable version from PyPI.

For bleeding edge code, git clone https://github.com/spotify/luigi and python setup.py install. Bleeding edge documentation can be found here.

If you want to run the central scheduler (highly recommended), you need to install Tornado which you can do from PyPI as well: pip install tornado.

316 questions
4
votes
1 answer

MongoDB in Luigi Python

I would like to know if there is a way to output to a MongoDB in Luigi. I see in the documentation they support files (local FS, HDFS), S3, PostgreSQL but not MongoDB. If not, could someone explain me why not? Maybe it is a bad idea to have it? I…
user2288043
  • 221
  • 4
  • 13
4
votes
1 answer

How to run a luigi task with spark-submit and pyspark

I have a luigi python task which includes some pyspark libs. Now I would like to submit this task on mesos with spark-submit. What should I do to run it? Below is my code skeleton: from pyspark.sql import functions as F from pyspark import…
zuhakasa
  • 133
  • 2
  • 11
4
votes
3 answers

How do you pass multiple arguments to a Luigi subtask?

I have a Luigi task that requires a subtask. The subtask depends on parameters passed through by the parent task (i.e. the one that is doing the requireing). I know you can specify a parameter that the subtask can use by setting... def…
guzman
  • 167
  • 2
  • 11
4
votes
1 answer

Luigi write file directly to S3

I'm creating a data pipeline with Luigi and I'm trying to write the processed data to S3 bucket directly. The code I used is: import luigi from luigi.s3 import S3Target, S3Client class myTask(luigi.Task): def requires(self): return…
Z.G
  • 83
  • 5
4
votes
1 answer

Using Parameters in python luigi

I have am triggering Luigi via luigi.run(["--local-scheduler"], main_task_cls=Test(Server = ActiveServer, Database = DB)) and in my class I have: class Test(luigi.Task): Database = luigi.Parameter() Server = luigi.Parameter() but the…
KillerSnail
  • 2,637
  • 5
  • 39
  • 59
4
votes
1 answer

Running Hadoop jar using Luigi python

I need to run a Hadoop jar job using Luigi from python. I searched and found examples of writing mapper and reducer in Luigi but nothing to directly run a Hadoop jar. I need to run a Hadoop jar compiled directly. How can I do it?
RAJKUMAR PADILAM
  • 101
  • 1
  • 10
3
votes
1 answer

Luigi: how to pass different arguments to leaf tasks?

This is my second attempt at understanding how to pass arguments to dependencies in Luigi. The first one was here. The idea is: I have TaskC which depends on TaskB, which depends on TaskA, which depends on Task0. I want this whole sequence to be…
lesisey
  • 41
  • 3
3
votes
1 answer

Luigi: Is there a way to pass 'false' to a bool parameter from the command line?

I have a Luigi task with a boolean parameter that is set to True by default: class MyLuigiTask(luigi.Task): my_bool_param = luigi.BoolParameter(default=True) When I run this task from terminal, I sometimes want to pass that parameter as False,…
DalyaG
  • 2,067
  • 2
  • 12
  • 14
3
votes
1 answer

Can't pickle : attribute lookup class_name on abc failed

I'm getting the above error as I try to create dependencies (subtasks) based on dependency relationship defined in a dictionary ("cmdList). For instance, "BDX010" is a dependency of "BDX020". I'm using Python 3.7. Please see the stack trace at the…
Jason O.
  • 2,704
  • 3
  • 26
  • 59
3
votes
0 answers

how can I pause a docker container running luigi?

I have a docker container running a 10 hour luigi task. I want to pause the container to use my laptop for something else. I tried "docker pause" but when I unpause the luigi scheduler shows no tasks running. So I have to start again. Is there any…
simon
  • 2,207
  • 13
  • 19
3
votes
1 answer

Luigi Programmatic Configuration

I was using a configuration file similar to the following for my luigi workflows: # Luigi logging configuration [logging] version = 1 disable_existing_loggers = false [logging.formatters.simple] format = "{levelname:8} {asctime} {module}:{lineno}…
treyhakanson
  • 3,904
  • 1
  • 11
  • 31
3
votes
1 answer

How to ignore failures on Luigi tasks triggered inside another task's run()

Consider the following tasks: import luigi class YieldFailTaskInBatches(luigi.Task): def run(self): for i in range(5): yield [ FailTask(i, j) for j in range(2) ] class…
Vitor Baptista
  • 1,495
  • 1
  • 14
  • 25
3
votes
1 answer

Python Luigi with Docker - Threading/Signal issue

Overview We're building a pipeline inside a Docker container using Luigi. This is my first time using Luigi and I'm trying to get it running but I'm stuck on a Python threading/signal error. What we're building We have a container that runs a…
3
votes
2 answers

Luigi global variables

I would like to set some target paths as global variables in Luigi. The reason is that the target paths I'm using are based on the last run of a given numerical weather prediction (NWP), and it takes some time to get the value. Once I have checked…
3
votes
2 answers

Python script produces zombie processes only in Docker

I have quite complicated setup with Luigi https://github.com/spotify/luigi https://github.com/kennethreitz/requests-html and https://github.com/miyakogi/pyppeteer But long story short - everything works fine at my local Ubuntu (17.10) desktop,…
scythargon
  • 2,750
  • 3
  • 23
  • 49