I am working on a personal project, and am stuck on extracting the text surrounding month abbreviations.
A sample input text is of the form:
text = "apr25, 2016\nblah blah\npow\nmay22, 2017\nasdf rtys\nqwer\njan9, 2018\npoiu\nlkjhj yertt"
I expect output of the form:
[ ("apr25, 2016\nblah blah\npow\n"), ("may22, 2017\nasdf rtys\nqwer\n"), ("jan9, 2018\npoiu\nlkjhj yertt") ]
I tried a simple regex, but it is incorrect:
import re
# Greedy version
REGEX_MONTHS_TEXT = re.compile(r'(apr[\w\W]*)|(may[\w\W]*)|(jan[\w\W]*)')
REGEX_MONTHS_TEXT.findall(text)
# output: [('apr25, 2016\nblah blah\npow\nmay22, 2017\nasdf rtys\nqwer\njan9, 2018\npoiu\nlkjhj yertt', '', '')]
# Non-Greedy version
REGEX_MONTHS_TEXT = re.compile(r'(apr[\w\W]*?)|(may[\w\W]*?)|(jan[\w\W]*?)')
REGEX_MONTHS_TEXT.findall(text)
# output: [('apr', '', ''), ('', 'may', ''), ('', '', 'jan')]
Can you help me produce the desired output with python3 regex?
Or do i need to write custom python3 code to produce the desired output?