I'm trying to write a regex which finds all characters between a starting token ('MS' or 'PhD') and an ending token ('.' or '!'). What makes this tricky is that it's fairly common for both starting tokens to be present in my text data, I'm only interested in the characters bounded by the last starting token and first ending token. (And all such occurrences.)
start = 'MS|PhD'
end = '.|!'
input1 = "Candidate with MS or PhD in Statistics, Computer Science, or similar field."
output1 = "in Statistics, Computer Science, or similar field"
input2 = "Applicant with MS in Biology or Chemistry desired."
output2 = "in Biology or Chemistry desired"
Here's my best attempt, which is currently returning an empty list:
# start any char end
pattern = r'^(MS|PhD) .* (\.|!)$'
re.findall(pattern,"candidate with MS in Chemistry.")
>>>
[]
Could someone point me in the right direction?