5

I have written a scraper using scrapy in python. It contains 100 start_urls.

I want to terminate the scraping process once a condition is met. ie terminate scraping of a particular div is found. By terminate I mean it should stop scraping all the urls .

Is it possible

user2129794
  • 2,330
  • 8
  • 31
  • 47
  • Possible duplicate: http://stackoverflow.com/questions/4448724/force-my-scrapy-spider-to-stop-crawling – ρss May 27 '14 at 08:43
  • You may have a look at the [`CloseSpider` Exception](http://doc.scrapy.org/en/latest/topics/exceptions.html#closespider), but requests that are still in progress (HTTP request sent, response not yet received) will still be parsed. No new request will be processed though. – paul trmbrth May 27 '14 at 09:35
  • @paultrmbrth am getting the following error on using this : raise CloseSpider('bandwidth_exceeded') exceptions.NameError: global name 'CloseSpider' is not defined – user2129794 May 27 '14 at 10:09
  • 2
    Simply add this line at the beginning of your source file: `from scrapy.exceptions import CloseSpider` – paul trmbrth May 27 '14 at 10:10
  • @paultrmbrth you should post it as an answer. Deserves to be accepted. – alecxe May 27 '14 at 17:00

1 Answers1

13

What you're looking for is the CloseSpider exception.

Add the following line somewhere at the top of your source file:

from scrapy.exceptions import CloseSpider

And when you detect that your termination condition is met, simply do something like

        raise CloseSpider('termination condition met')

in your callback method (instead of returning or yielding an Item or Request).

Note that requests that are still in progress (HTTP request sent, response not yet received) will still be parsed. No new request will be processed though.

paul trmbrth
  • 19,235
  • 3
  • 47
  • 62