I´m scraping websites with scrapy and I do filter informations like time and day with regular expressions. I´m getting the whole string but also additional a part of the whole string returned. How can I exclude this part of the string to just get the whole one returned?
class posSpider(scrapy.Spider):
start_urls = ["https://posaunenchor-eibach.jimdofree.com/"]
def parse(self, response):
zeitpattern = re.compile(r'\s((montag[s]?|dienstag[s]?|mittwoch[s]?|donnerstag[s]?|freitag[s]?|samstag[s]?|sonntag[s]?).*[0-2][0-9][.:][0-5][0-9].*[0-2][0-9][.:][0-5][0-9]\s*uhr?)', re.IGNORECASE)
zeit = zeitpattern.findall(inhalt)
print(zeit)
output is: ('dienstags von 20.00 Uhr bis 21.30 Uhr', 'dienstags')
Why is 'dienstags' returned one more time alone?