0

I´m scraping websites with scrapy and I do filter informations like time and day with regular expressions. I´m getting the whole string but also additional a part of the whole string returned. How can I exclude this part of the string to just get the whole one returned?

class posSpider(scrapy.Spider):

    start_urls = ["https://posaunenchor-eibach.jimdofree.com/"]

def parse(self, response):
            zeitpattern = re.compile(r'\s((montag[s]?|dienstag[s]?|mittwoch[s]?|donnerstag[s]?|freitag[s]?|samstag[s]?|sonntag[s]?).*[0-2][0-9][.:][0-5][0-9].*[0-2][0-9][.:][0-5][0-9]\s*uhr?)', re.IGNORECASE)
            zeit = zeitpattern.findall(inhalt)
            print(zeit)

output is: ('dienstags von 20.00 Uhr bis 21.30 Uhr', 'dienstags')

Why is 'dienstags' returned one more time alone?

Wiktor Stribiżew
  • 484,719
  • 26
  • 302
  • 397

0 Answers0