Questions tagged [luigi]

Luigi is a Python package that helps you build complex pipelines of batch jobs.

Luigi is a Python package that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization, handling failures, command line integration, and much more.

For further information, see the documentation at luigi.readthedocs.io.

Getting Luigi

Run pip install luigi to install the latest stable version from PyPI.

For bleeding edge code, git clone https://github.com/spotify/luigi and python setup.py install. Bleeding edge documentation can be found here.

If you want to run the central scheduler (highly recommended), you need to install Tornado which you can do from PyPI as well: pip install tornado.

316 questions
0
votes
0 answers

cronjob not executed on docker swarm deployment

I am using docker swarm to control a deployment with several containers used for a machine learning application. I have a bash script which sends commands to be executed by some containers. When I do execute this script manually in the console,…
Karlovalentin
  • 301
  • 2
  • 13
0
votes
2 answers

How do Luigi parameters work?

So I have two tasks (let's say TaskA and TaskB). I want both tasks to run hourly, but TaskB requires TaskA. TaskB does not have any parameters, but TaskA has two parameters for the day and the hour. If I run TaskB on the command line, would I…
BlaqICE
  • 199
  • 11
0
votes
1 answer

How to update and delete data using Luigi?

what module can be use from luigi for update/delete data into database? i have use copy to table and sql alchemy for inserting data. for update and delete document is not clear how can it be achieved? please advise.
aka
  • 189
  • 1
  • 4
  • 15
0
votes
1 answer

Persistence store for state of a job in LUIGI

I started with LUIGI recently and had few questions which I was unable to answer myself using documentation Question is regarding state of a job in LUIGI With Luigi we can set some global configuration (record_task_history) to track history of job…
learner
  • 1,673
  • 6
  • 27
  • 51
0
votes
1 answer

Retry .complete() for WrapperTask

I am using Luigi to run several tasks, and then I need to bulk transfer the output to a standardized file location. I've written a WrapperTask with an overridden complete() method to do this: from luigi.task import flatten class…
Alex Spangher
  • 857
  • 2
  • 13
  • 20
0
votes
1 answer

Automatic instantiate in luigi?

In luigi.Task.run, we need to serialize objects into files/database/etc.: MyTask(luigi.Task): param = luigi.Parameter() def requires(self): AnotherTask(self.param) def output(self): …
keisuke
  • 1,703
  • 4
  • 16
  • 28
0
votes
1 answer

Luigi : Step by Step instructions not working

I'm a newbie to python, I've installed Luigi-2.0.1 on my RHEL linux. Trying to run a sample program import luigi class MyTask(luigi.Task) : param = luigi.Parameter(default=42) def requires(self): return…
venBigData
  • 500
  • 6
  • 18
0
votes
1 answer

Luigi doesn't work as expected with Spark & Redshift

I'm running an EMR Spark cluster (uses YARN) and I'm running Luigi tasks directly from the EMR master. I have a chain of jobs that depends on data in S3 and after a few SparkSubmitTasks will eventually end up in Redshift. import luigi import…
jackar
  • 704
  • 5
  • 14
-1
votes
0 answers

python/luigi - Checking length of list of s3 objects does not return correct result

s3obj_list = client.listdir(path=self.input().path) filecount = len(list(s3key_list)) print('File count: ' + str(filecount)) if len(list(s3key_list)): print('Not empty') else: print('Empty') ------ Output File count: 4 Empty ----- Why does…
-1
votes
1 answer

Luigi read application specific property file and make it available across all tasks

I am using the luigi for machine learning workflow, I have requirement like there will be a config or property file for environment specific. can someone tell me how to load the environment specific property file and make it available across the all…
Anantha
  • 75
  • 1
  • 7
-1
votes
1 answer

Luigi dependencies specification issue with a separate task

I have 3 Luigi tasks: first generates an output file that is written to hadoop, second - uses this output file to load it into Elasticsearch, third one - gets a completely separate file and also loads it into Elasticsearch. Third task is rather…
Nikita Vlasenko
  • 2,837
  • 4
  • 33
  • 62
-1
votes
1 answer

Run Luigi task that depends on another task

I have one task SeqrMTToESTask that depends on another one called SeqrVCFToMTTask. You can see the full code here: https://github.com/macarthur-lab/hail-elasticsearch-pipelines/blob/master/luigi_pipeline/seqr_loading.py Now, I ran the first task…
Nikita Vlasenko
  • 2,837
  • 4
  • 33
  • 62
-1
votes
1 answer

Problems producing desired luigi output

I am trying to create a pipeline that takes in 3 files, takes n amount of rows from each file (represented by obs_num) compares each of the values in the files to a random float between 0 and 1 and either returns the obs_num if it is greater than…
Emm
  • 1,634
  • 10
  • 27
-1
votes
1 answer

How to run a pipeline if each time the requirements of a task changes?

I have a pipeline: F -> M -> S. Where F, M and S are tasks. I call luigi with task S. Task S requires M; and M requires F. But sometimes, M requires D, others times requires B. F, D an B are different, nothing alike, but the output of all these…
-2
votes
1 answer

How to create singleton task in Luigi?

I need to run once a day task A. At 0:00:00 for specificity. But task execution can take more then 24 hours. In this case, I should not rerun the task - I should skip execution. How can I implement such singleton task in Luigi?
Sklavit
  • 1,813
  • 18
  • 23
1 2 3
21
22