0

I learned that the lookahead regex is like this x(?=y) and means

Matches x only if x is followed by y.

according to the MDN. However I find this code on w3school:

<p>A form with a password field that must contain 8 or more characters that are of at least one number, and one uppercase and lowercase letter:</p>

<form action="demo_form.asp">
Password: <input type="password" name="pw" pattern="(?=.*\d)(?=.*[a-z])(?=.*[A-Z]).{8,}" title="Must contain at least one number and one uppercase and lowercase letter, and at least 8 or more characters">
<input type="submit">
</form>

Why does (?=.*\d) indicate "at least one number appears in the string"? And the three pair of parentheses don't matter where the match is, because as I look at this, it should be first one or more digit followed by one or more lowercase letters and then one or more uppercase letters and then 8 or more characters, what is wrong?

After a little search, it seems regex is different in various languages, is that what this is about?

edit: I don't think you guys got my question. I meant the lookahead is like x(?=y), but the (?=.*\d) doesn't precede with anything, so what to match? And the second question, the three parentheses comes with specific order, but the match doesn't have to be same order, since /abc/ matches "abcdd" not "cbdda" ---- why doesn't the order matter?

update: OK, probably I have a misunderstanding of lookahead, and thanks to whoever changed my title for this problem. So here's my final update if there's no more need after:

My problem is like the title says, a lookahead (?=pattern) can omit the preceding pattern, so what does it mean when nothing before the parentheses? I searched for 'lookahead', almost all explanation comes with a preceding pattern.

And I tried something on regex tester: /(?=\d)/ will create an infinite match if the string contains a digit, like "a2", but it will show "no match" if the string has no digit, like "a"

Interestingly /(?=\d)./ will match for any digit, now it seems equals to \d

I have no idea what's going on right now, I'll go and learn the lookahead again but any further answers are welcomed, thanks

Drake Xiang
  • 328
  • 7
  • 13
  • "why doesn't the order matter" -- because a look ahead does not consume the input. That's basically the gist of my answer … – kay Mar 11 '16 at 19:06
  • @Drake: `*` matches zero or more characters other than a newline as many as possible. Thus, the `(?=.*)` lookahead just checks if there is somewhere further on the current line. If there is that pattern, "true" if returned, the engine searches further or returns a valid match. Is that clear? – Wiktor Stribiżew Mar 11 '16 at 20:29
  • The `pattern` attribute value is *anchored* by default, `^(?:` is added at the beginning, and `)$` is added at the end. Thus, the lookaheads actually are executed one by one after the start position was matched. See [this post](http://stackoverflow.com/questions/32477182/restricting-character-length-in-regular-expression/32477224#32477224) on how such lookaheads work. – Wiktor Stribiżew Mar 11 '16 at 20:37
  • [One more explanation](http://www.regular-expressions.info/lookaround.html): *The difference is that lookaround actually matches characters, but then **gives up the match, returning only the result: match or no match**. That is why they are called **"assertions"**. They do not consume characters in the string, but only assert whether a match is possible or not.*. – Wiktor Stribiżew Mar 11 '16 at 20:49
  • *what does it mean when nothing before the parentheses?* - Wrong, there are *empty string locations* before each character in a string. An unanchored lookahead will be executed (the lookahead subpattern will try to match the string with its subpattern) at every such location (in `abc`, `(?=\w)` will match 3 empty strings: before `a`, `b` and `c`. – Wiktor Stribiżew Mar 11 '16 at 20:52
  • *And I tried something on regex tester: /(?=\d)/ will create an infinite match if the string contains a digit* is only true if you use a global modifier `/g`. Check [this post](http://stackoverflow.com/a/33903830/3832970) how to override this behavior. – Wiktor Stribiżew Mar 11 '16 at 20:58
  • */(?=\d)./ will match for any digit, now it seems equals to \d* - yes, in meaning, but not in how it is done internally. The lookahead does double work before matching the digit with `.`. – Wiktor Stribiżew Mar 11 '16 at 21:02
  • 1
    @WiktorStribiżew Thank you so much for detailed explanation, I read the post from regular-expressions.info and I did got some clue from the line you quoted. I have to sleep now, hope my mind would be clearer tomorrow, thanks again, good night – Drake Xiang Mar 11 '16 at 21:17

1 Answers1

2

The (?=pattern) is a regex lookahead. It's a zero-width, "true or false" part of the pattern that doesn't actually "eat" any characters, but must match (be true) for the expression to succeed. So,

(?=.*\d)

means "lookahead to see .*\d, which is 'anything' (any number of times, greedy), followed by a number". Since the .* will by default eat up all characters until the end of the string, obviously the \d wouldn't have anything left to eat for itself. The .* backtracks, or gives up, a character at a time until the \d can match. Since * means 'zero or more', .* will give up everything it has matched, if necessary, to let the \d match. Thus, at least one digit somewhere in the string is enough to let the pattern match.

Scott Weaver
  • 6,328
  • 2
  • 23
  • 37