I am quite new to Scrapy and want to try the following: Extract some values from a webpage, store it in a variable and use it in my main script. Therefore I followed their tutorial and changed code for my purposes:
import scrapy
from scrapy.crawler import CrawlerProcess
class QuotesSpider(scrapy.Spider):
name = "quotes"
start_urls = [
'http://quotes.toscrape.com/page/1/'
]
custom_settings = {
'LOG_ENABLED': 'False',
}
def parse(self, response):
global title # This would work, but there should be a better way
title = response.css('title::text').extract_first()
process = CrawlerProcess({
'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
})
process.crawl(QuotesSpider)
process.start() # the script will block here until the crawling is finished
print(title) # Verify if it works and do some other actions later on...
This would work so far, but I am pretty sure it is not a good style, or even has some bad side effects if I define the title variable as global. If I skip that line, then I get the "undefined variable" error of course :/ Therefore I am searching for a way to return the variable and use it in my main script.
I have read about item pipeline but I was not able to make it work.
Any help/ideas are heavily appreciated :) Thanks in advance!