7

I have an application which schedules scrapy crawl jobs via scrapyd. Items flow nicely to the DB, and I can monior the job status via the listjobs.json endpoint.So far so good, and I can even know when jobs are finished.

However, sometimes jobs can fail. Maybe because of an HTTP Error, or bad credentials. I would like to access the finished jobs statuses, preferably from the scrapyd api. Something along the lines of what listjobs.json is giving me today, i would love to have a result that would look like:

{"status": "ok",


"error": [{"id": "78391cc0fcaf11e1b0090800272a6d06", "spider": "spider1"}],
 "running": [{"id": "422e608f9f28cef127b3d5ef93fe9399", "spider": "spider2", "start_time": "2012-09-12 10:14:03.594664"}],
 "finished": [{"id": "2f16646cfcaf11e1b0090800272a6d06", "spider": "spider3", "start_time": "2012-09-12 10:14:03.594664", "end_time": "2012-09-12 10:24:03.594664"}]}

Of course, I can have the jobs themselves update some DB or File, and I can check that from the app, but I was wondering if there's a cleaner way.

Oren Yosifon
  • 671
  • 6
  • 18

0 Answers0