Inform 7 text adventure code can heavily feature directions such as north, south, west, east, northwest, southwest, southeast, and northeast. I am developing a code verifying script, and one of its tasks is to find instances of these words. My first try used brute force:
import re
sample_line = 'The westerly barn is a room. The field is east of the barn. \
The stable is northeast of the field. The forest is northwest of the field.'
# note: this could be generated with zip and north/south'' and east/west/'', but that's another exercise.
x = [ 'north', 'south', 'east', 'west', 'northwest', 'southwest', 'southeast', 'northeast' ]
regstr = r'\b({0})\b'.format('|'.join(x))
print(re.findall(regstr, sample_line))
This worked and gave me what I wanted: [ 'east', 'northeast', 'northwest' ]
while ignoring westerly
.
I wanted to use a bit of symmetry to cut down the regex some more. But I noticed my preferred way left open the possibility of a zero-length match. So I came up with this:
regstr2 = r'\b(north|south|(north|south)?(east|west))\b'
print(sample_line)
print([x[0] for x in re.findall(regstr2, sample_line)])
This worked, but it felt inelegant.
My third try, with help from this link, was:
regstr3 = r'(?=.)(\b(north|south)?(east|west)?\b)'
print(sample_line)
print([x[0] for x in re.findall(regstr3, sample_line)])
This gots the three directions I want, but it also got a lot of zero-length matches I'd hoped to ignore, even with the recommended (?=.).
Is there a way Python could get a variant of regstr3
to work? While there are obvious workarounds, it would be pleasing to have a tidy regex without a lot of repetitions and similar words.