2

First, I know x(?=y) Matches 'x' only if 'x' is followed by 'y'.

  • But, when I try r'^(?=.*[0-9])(?=.*[a-z])',

    • why both 0a and a0 match?
    • Why the order is not important at all?
    • For 0a, what it matches?
      • If it matches the empty string before 0, it should fail the second condition (?=.*[a-z]) because the empty string before 0 followed by 0, but not a-z.
      • If it matches 0 because it followed by a, it should fail the first condition, because 0 not followed by [0-9].
      • I don't know what's wrong with the way I think. And I am not sure if I express myself clear so that you can understand what I mean..
  • and for r'^(?=.*[0-9])(?=.*[a-z])$', if the above situation without $ works, why not this one? I fail to figure out what this matches. It seems it does not match anything.

Thanks a lot for your help.

sgon00
  • 3,165
  • 1
  • 25
  • 36
  • Your regex consists of assertions only. You test if the string contains a digit. Then you test if the string contains a lowercase character. – The fourth bird Nov 17 '18 at 10:34
  • 1
    There is no ordering in look ahead groups. The regex engine does all efforts to try and match the input and it does not work the other way round where it tries to mismatch the input. So the two lookahead groups you have specified both works as match group 1 and match group 2 and if both match then only it makes a successful match. – Pushpesh Kumar Rajwanshi Nov 17 '18 at 10:34
  • For your last point, look aheads only try and match the pattern and they actually don't consume any input, so after lookaheads if there is nothing to consume, the regex will fail to match – Pushpesh Kumar Rajwanshi Nov 17 '18 at 10:47
  • @PushpeshKumarRajwanshi I think there is ordering in look ahead groups. By reading the accepted answer, I actually understood. Both `0a` and `a0` actually matches the beginning empty string ``. You can find out this in regex101.com. The magic part happened because of both groups have `.*`. For example, If the rule changes to `'^(?=.*[0-9])(?=[a-z])'`, it won't match `"0a"` because it can not match the empty string when there is no `.*` before `[a-z]`. – sgon00 Nov 17 '18 at 15:45
  • 1
    @sgon00: Ordering if of course there because regex engine will evaluate something first and only then next but I wrote that more in the context that your order of look around will not impact the overall success or failure of the match. Depending upon the input text, one order may be more favorable (performance wise) than the other, but since the input text can be any random string, hence the order does not matter. – Pushpesh Kumar Rajwanshi Nov 17 '18 at 15:54
  • @PushpeshKumarRajwanshi sorry, I think you are right. The ordering does not matter. Even without `.*`, `'^(?=.*[0-9])(?=[a-z])'` and `'^(?=[a-z])(?=.*[0-9])'` are the same. Thanks a lot for this clarification. – sgon00 Nov 17 '18 at 16:08
  • 1
    @sgon00: Yes the ordering doesn't change the regex overall :) Glad to give my little help :) – Pushpesh Kumar Rajwanshi Nov 17 '18 at 16:11

1 Answers1

6

regex101.com has a regex debugger which you can use to see exactly how the regex engine behaves.

A good point to note here is that the matches from your regex are always going to be 0-length, because (?=) don't match anything. They only look ahead to check for a pattern.

As you may know, the regex engine will move from the start of the string to the end of the string as it matches the characters.

Why does 0a match?

Initially, the engine is at the start of the string. It matches the "start of string" anchor ^. And then it checks to see if it can see a pattern described in the lookahead (?=.*[0-9]). Can it? Yes. .* can match nothing, and [0-9] can match the 0. Then it checks the second lookahead. Note that the engine is still at its starting position. It checks (?=.*[a-z]). .* matches the 0 and [a-z] matches the a. Both lookaheads match so the ^ remains matched.

Why does a0 match?

This is pretty much same as before. The first lookahead: .* matches a and [0-9] matches the 0. The second lookahead: .* matches nothing and [a-z] matches a.

Why does ^(?=.*[0-9])(?=.*[a-z])$ behave differently?

That regex can never match, in fact. Without the lookaheads, the regex becomes ^$. Only an empty string matches ^$. And empty strings can't have letters and digits, so the lookaheads will always fail.

Sweeper
  • 145,870
  • 17
  • 129
  • 225
  • Thank you very much for the detail reply and explanation. Thank you very much for introducing the website, very useful. I finally understood both cases match the beginning empty string and the magic part is the `.*`. The `.*` makes the ordering unnecessary. Thanks. – sgon00 Nov 17 '18 at 15:54