I am working on building my first Luigi pipeline, and I am currently testing tasks individually before building my dependencies. During testing, I am using a version of the following main method to build a task:
if __name__ == "__main__":
headers = dict()
headers["Content-Type"] = "application/json"
headers["Accept"] = "application/json"
luigi.build[(CSVValidator(jsonfile = '/sample_input/sample_csv.json',
docfile = None,
error_limit = 2,
order_fields = 3,
output_file = 'validation_is_us.txt',
header = headers)])
luigi.run()
This is what my csv_validator looks like:
class CSVValidator(luigi.Task):
jsonfile = luigi.Parameter()
docfile = luigi.Parameter()
error_limit = luigi.Parameter()
order_fields = luigi.Parameter()
output_file = luigi.Parameter()
header = luigi.DictParameter()
def output(self):
return luigi.LocalTarget(self.output_file + "/csv_validator_data_%s.txt" % time.time())
def run(self):
output_file = self.output().open('w')
files = {}
data = {}
files["jsonfile"] = open(self.jsonfile, 'rb')
files["docfile"] = open(self.docfile, 'rb')
data["error_limit"] = self.error_limit
data["order_fields"] = self.order_fields
r = requests.post(*****~~~~~*****~~~~~,
headers=headers,
data=data, files=files)
task_response = r.text.encode(encoding="UTF-8")
print type(task_response)
print(task_response)
jsontaskdata = json.loads(task_response)
json.dump(jsontaskdata, output_file)
print("validated")
output_file.close()
This task, however, is never actually run. Instead the luigi central scheduler claims that this task is already complete:
===== Luigi Execution Summary =====
Scheduled 2 tasks of which:
* 1 complete ones were encountered:
- 1 CSVValidator(...)
* 1 ran successfully:
- 1 Downloader(...)
This progress looks :) because there were no failed tasks or missing dependencies
Other tasks I have created, Downloader for example, do run successfully every time. What defines a complete task here? I don't understand what it means.
Thanks for your time!