0

I have code as below. I want to find cases where word "eve" is present

My conditions are

  1. there shouldn't be any numbers or alphabets after or before "eve"

  2. before 'eve' there could be a space or nothing

  3. after 'eve' there could be symbols or nothing

below code finds ['eve?', 'eve '] but fails to find the last eve. How should I change the code

import re
your_string = "eve? is a rule, but not evening and not never. eve eve"
re.findall(r'\beve[^a-zA-Z\d:]', your_string)

I tried below code where i am trying to code that after 'eve' there could be either \b or [^a-zA-Z\d:] but it didnt work

import re
your_string = "eve? is a rule, but not evening and not never. eve eve"
re.findall(r'\beve(\b|[^a-zA-Z\d:])', your_string)
user2543622
  • 4,682
  • 20
  • 62
  • 117
  • what about using negative look arounds assertions `r'(? – Nahuel Fouilleul Oct 31 '18 at 16:19
  • 1
    seems to work fine to me https://regex101.com/r/aqVD6w/1 – doom87er Oct 31 '18 at 17:11
  • @doom87er when i run my last set of the code i get output `['', '', '']` while when i run the first set of the code above, i get `['eve', 'eve', 'eve']`. Any idea why is so? – user2543622 Oct 31 '18 at 17:30
  • 1
    it looks like it's taking the contents of your capture group instead of the full match, change it to a non capture group `(?:...)` it'll look like this: `\beve(?:\b|[^a-zA-Z\d:])` – doom87er Oct 31 '18 at 17:34
  • thanks. could you also explain what does `:` stand for in `[^a-zA-Z\d:]`? – user2543622 Oct 31 '18 at 17:40
  • `[^...]` is a negated character class, `:` holds no special meaning in a character classes, so it simply includes the `:` character literally in the character class. – doom87er Oct 31 '18 at 17:52
  • didnt get that. could you provide me with an example? – user2543622 Oct 31 '18 at 20:07
  • If i try `re.findall(r'\beve(?:\b|[^:])', your_string)` it finds `eve` as well as `eve:`...so what is the purpose of :? – user2543622 Oct 31 '18 at 20:14
  • @doom87er please reply if possible – user2543622 Nov 01 '18 at 13:23
  • the expresion `(?:\b|[^:])` is a Boolean Or that matches on a word boundary OR a character that is not ':'. It's seeing the boundary between 'eve' and ':' and matching that instead. – doom87er Nov 01 '18 at 14:05
  • if you want to learn more about how Regex's work, you can find some great resources here: https://stackoverflow.com/questions/22937618/reference-what-does-this-regex-mean – doom87er Nov 01 '18 at 14:06

2 Answers2

3

Use word boundary on each side:

import re
your_string = "eve? is a rule, but not evening and not never. eve eve"
print re.findall(r'\beve\b', your_string)

Output:

['eve', 'eve', 'eve']
Toto
  • 83,193
  • 59
  • 77
  • 109
0

Conditions are redundant, but may be translared into regex, and can be used together,:

  1. add (?<![a-zA-Z\d]) before and (?![a-zA-Z\d]) after
  2. add (?:^|(?<=\s)) before
  3. add (?:$|(?=[^a-zA-Z\d])) after

updated with code

import re
your_string = "eve? is a rule, but not evening and not never. eve eve"
re.findall(r'(?<![a-zA-Z\d])(?:^|(?<=\s))eve(?![a-zA-Z\d])(?:$|(?=[^a-zA-Z\d]))', your_string)
Nahuel Fouilleul
  • 16,821
  • 1
  • 26
  • 32
  • you just need to insert the expression into the right place this should not be an issue anyway – Nahuel Fouilleul Oct 31 '18 at 16:32
  • SO, is not just to find code to copy paste but to understand how it's working – Nahuel Fouilleul Oct 31 '18 at 16:34
  • The negative look around are redondant with the positive ones. If `(?:^|(?<=\s))` is satisfied, of course `(? – Toto Oct 31 '18 at 16:40
  • maybe the point is that if a character is matched in previous match, the position in input string will be after this character so it could not be match in next match like the space between the two last `eve`s however as lookarounds are not consuming the input cursor position doesn't change – Nahuel Fouilleul Oct 31 '18 at 16:44