Regex for existence of some words whose order doesn't matter

Question

I would like to write a regex for searching for the existence of some words, but their order of appearance doesn't matter.

For example, search for "Tim" and "stupid". My regex is Tim.*stupid|stupid.*Tim. But is it possible to write a simpler regex (e.g. so that the two words appear just once in the regex itself)?

you cant split it into two searches if that's simpler for you if order doesn't matter — C.B., Jul 09 '14 at 14:17
did you want to match the whole line which contains Tim and stupid strings? — Avinash Raj, Jul 09 '14 at 14:24

score 44 · Accepted Answer · edited May 23 '17 at 12:02

See this regex:

/^(?=.*Tim)(?=.*stupid).+/

Regex explanation:

^ Asserts position at start of string.
(?=.*Tim) Asserts that "Tim" is present in the string.
(?=.*stupid) Asserts that "stupid" is present in the string.
.+Now that our phrases are present, this string is valid. Go ahead and use .+ or - .++ to match the entire string.

To use lookaheads more exclusively, you can add another (?=.*<to_assert>) group. The entire regex can be simplified as /^(?=.*Tim).*stupid/.

See a regex demo!

>>> import re
>>> str ="""
... Tim is so stupid.
... stupid Tim!
... Tim foobar barfoo.
... Where is Tim?"""
>>> m = re.findall(r'^(?=.*Tim)(?=.*stupid).+$', str, re.MULTILINE)
>>> m
['Tim is so stupid.', 'stupid Tim!']
>>> m = re.findall(r'^(?=.*Tim).*stupid', str, re.MULTILINE)
>>> m
['Tim is so stupid.', 'stupid Tim!']

Read more:

Regex with exclusion chars and another regex

That's the way, +1 :) ... A little commentary in case anyone uses it: the `^` is particularly important, because without it, if the lookaheads fail at the beginning of the string, the engine will move to the next position and try again. On the other hand the `$` could be dropped as the `.+` guarantees we will reach the end of the string. — zx81, Jul 20 '14 at 00:08

hwnd · Answer 2 · 2014-07-09T14:37:00.963

You can use Positive Lookahead to achieve this. The lookahead approach is nice for matching strings that contain both substrings regardless of order.

pattern = re.compile(r'^(?=.*Tim)(?=.*stupid).*$')

Example:

>>> s = '''Hey there stupid, hey there Tim
Hi Tim, this is stupid
Hi Tim, this is great'''
...
>>> import re
>>> pattern = re.compile(r'^(?=.*Tim)(?=.*stupid).*$', re.M)
>>> pattern.findall(s)

# ['Hey there stupid, hey there Tim', 'Hi Tim, this is stupid']

Regex for existence of some words whose order doesn't matter

2 Answers2

Linked