-1

While looking into the regular expression, I found the following example:

>>> import re
>>> p = re.compile('.*[.].*$')
>>> m = p.search('foo.bar')
>>> print(m.group())
foo.bar

I don't understand the process in which it recognizes simple filename with extensions like foo.bar, abc.xyz, my_files.txt. I thought this code would work like this:

  1. . matches with any character.
  2. * causes to match 0 or more repetitions.
  3. By 1. and 2., the whole string(foo.bar) matches with .*.
  4. [.] tries to find character ., but there are no characters left.
  5. .*$ doesn't do anything.
  6. No matches found.

I wonder how this code actually works.

im0j
  • 149
  • 6
  • 3
    By that logic you'd never be able to use `.*` in a regex because it would always match everything, which obviously isn't the case – Cory Kramer Apr 05 '18 at 16:21
  • `.*` matches `foo` because it backtracks so that `[.]` matches the `.`. The rest of the string is matched by `.*$`. You have to remember that regex engines *want* to match something, so they'll try until all possible tests have been exhausted. – ctwheels Apr 05 '18 at 16:23
  • Use regex101.com to explain it for you: https://regex101.com/r/nYevrx/1 – Harvey Apr 05 '18 at 16:24

1 Answers1

2

The expression * causes the regex engine to match as much as possible, not everything.

Typically, the regex engine will match through the end of line like you describe, but then backtrack to an earlier position until it can proceed with the rest of the match.

Maybe think of it sort of like a labyrinth solver which explores every possible junction of the labyrinth systematically until if finds an exit, or exhausts the search space.

tripleee
  • 139,311
  • 24
  • 207
  • 268