python regex end or beginning of a sentence

Question

I have code as below. I want to find cases where word "eve" is present

My conditions are

there shouldn't be any numbers or alphabets after or before "eve"
before 'eve' there could be a space or nothing
after 'eve' there could be symbols or nothing

below code finds ['eve?', 'eve '] but fails to find the last eve. How should I change the code

import re
your_string = "eve? is a rule, but not evening and not never. eve eve"
re.findall(r'\beve[^a-zA-Z\d:]', your_string)

I tried below code where i am trying to code that after 'eve' there could be either \b or [^a-zA-Z\d:] but it didnt work

import re
your_string = "eve? is a rule, but not evening and not never. eve eve"
re.findall(r'\beve(\b|[^a-zA-Z\d:])', your_string)

@doom87er when i run my last set of the code i get output `['', '', '']` while when i run the first set of the code above, i get `['eve', 'eve', 'eve']`. Any idea why is so? — user2543622, Oct 31 '18 at 17:30
it looks like it's taking the contents of your capture group instead of the full match, change it to a non capture group `(?:...)` it'll look like this: `\beve(?:\b|[^a-zA-Z\d:])` — doom87er, Oct 31 '18 at 17:34
thanks. could you also explain what does `:` stand for in `[^a-zA-Z\d:]`? — user2543622, Oct 31 '18 at 17:40
`[^...]` is a negated character class, `:` holds no special meaning in a character classes, so it simply includes the `:` character literally in the character class. — doom87er, Oct 31 '18 at 17:52
If i try `re.findall(r'\beve(?:\b|[^:])', your_string)` it finds `eve` as well as `eve:`...so what is the purpose of :? — user2543622, Oct 31 '18 at 20:14
the expresion `(?:\b|[^:])` is a Boolean Or that matches on a word boundary OR a character that is not ':'. It's seeing the boundary between 'eve' and ':' and matching that instead. — doom87er, Nov 01 '18 at 14:05
if you want to learn more about how Regex's work, you can find some great resources here: https://stackoverflow.com/questions/22937618/reference-what-does-this-regex-mean — doom87er, Nov 01 '18 at 14:06

score 3 · Accepted Answer · answered Oct 31 '18 at 16:18

3

Use word boundary on each side:

import re
your_string = "eve? is a rule, but not evening and not never. eve eve"
print re.findall(r'\beve\b', your_string)

Output:

['eve', 'eve', 'eve']

answered Oct 31 '18 at 16:18

Toto

83,193
59
77
109

Nahuel Fouilleul · Answer 2 · 2018-10-31T16:34:11.833

0

Conditions are redundant, but may be translared into regex, and can be used together,:

add (?<![a-zA-Z\d]) before and (?![a-zA-Z\d]) after
add (?:^|(?<=\s)) before
add (?:$|(?=[^a-zA-Z\d])) after

updated with code

import re
your_string = "eve? is a rule, but not evening and not never. eve eve"
re.findall(r'(?<![a-zA-Z\d])(?:^|(?<=\s))eve(?![a-zA-Z\d])(?:$|(?=[^a-zA-Z\d]))', your_string)

edited Oct 31 '18 at 16:34

answered Oct 31 '18 at 16:28

Nahuel Fouilleul

16,821
1
26
32

you just need to insert the expression into the right place this should not be an issue anyway – Nahuel Fouilleul Oct 31 '18 at 16:32
SO, is not just to find code to copy paste but to understand how it's working – Nahuel Fouilleul Oct 31 '18 at 16:34
The negative look around are redondant with the positive ones. If `(?:^|(?<=\s))` is satisfied, of course `(? – Toto Oct 31 '18 at 16:40
maybe the point is that if a character is matched in previous match, the position in input string will be after this character so it could not be match in next match like the space between the two last `eve`s however as lookarounds are not consuming the input cursor position doesn't change – Nahuel Fouilleul Oct 31 '18 at 16:44

python regex end or beginning of a sentence

2 Answers2