3

I have a spider which fetches the latest url based on a particular date range from a paginated webpage. When it gets all latest urls, my spider has to be closed.
How to close a spider?
I referred question : Force stop the spider
But raising an exception to close the spider is not pleasing to me.
Is there any other way I could achieve the same?

Community
  • 1
  • 1
Sarvesh
  • 63
  • 4

1 Answers1

2

You should use the Close Spider extension.

The conditions for closing a spider can be configured through the following settings:

CLOSESPIDER_TIMEOUT CLOSESPIDER_ITEMCOUNT CLOSESPIDER_PAGECOUNT CLOSESPIDER_ERRORCOUNT

dataisbeautiful
  • 506
  • 2
  • 11
  • Is it possible to add custom flags to this list? – Sarvesh Sep 24 '14 at 04:24
  • You can create your own extension or extend the close spider one easily if you want to extend this functionality. [github](https://github.com/scrapy/scrapy/blob/master/scrapy/contrib/closespider.py) – dataisbeautiful Sep 24 '14 at 04:28
  • Thanks @dataisbeautiful sounds to be better solution than raising exception. I will surely try it. – Sarvesh Sep 24 '14 at 05:03