3

-edit- NOTE the ? at the end of .{2,}?

I found out you can write

.{2,}?

Isnt that exactly the same as below?

.{2}

7 Answers7

18

No. {2,} means two times or more while {2} means exactly two times. Quantifiers are greedy by default, so given the string foo you would get foo if you use .{2,}, but fo if you use .{2,}? because you made it lazy. However, the latter is allowed to match more than two times if necessary, but .{2} always means exactly two characters.

So if you have the string test123 and the pattern .{2,}?\d, you would get test1 because it has to match up to four characters so the \d can also match.

Daniel Brückner
  • 56,191
  • 15
  • 92
  • 137
Daniel Egeberg
  • 8,229
  • 29
  • 44
6

No, they are different. ^.{2,}?$ matches strings whose length is at least 2 (as seen on rubular.com):

12
123
1234

By contrast, ^.{2}$ only matches strings whose length is exactly 2 (as seen on rubular.com).

It's correct that being reluctant, .{2,}? will first attempt to match only two characters. But for the overall pattern to match, it can take more. This is not the case with .{2}, which can only match exactly 2 characters.

References

Related questions

Community
  • 1
  • 1
polygenelubricants
  • 348,637
  • 121
  • 546
  • 611
3

In isolation they probably behave identical but not inside larger expressions because the lazy version is allowed to match more than two symbols.

             abx        abcx

^.{2,}?x$    match      match
^.{2}x$      match      no match
Daniel Brückner
  • 56,191
  • 15
  • 92
  • 137
  • I almost picked you but i liked rubular.com and would like more ppl to notice. –  Jul 06 '10 at 08:24
2

What makes this question especially interesting is that there are times when .{2,}? is equivalent to .{2}, but it should never happen. Others have already pointed out how a reluctant quantifier at the very end of a regex always matches the minimum number of of characters because there's nothing after it to force it to consume more.

The other place they shouldn't be used is at the end of a subexpression inside an atomic group. For example, suppose you try to match foo bar with

f(?>.+?) bar

The subexpression initially consumes the first 'o' and hands off to the next part, which tries unsuccessfully to match a space. Without the atomic group, it would backtrack and let the .+? consume another character. But it can't backtrack into the atomic group, and there's no wiggle room before the group, so the match attempt fails.

A reluctant quantifier at the end of a regex or at end of an atomic subexpression is definite code smell.

Alan Moore
  • 68,531
  • 11
  • 88
  • 149
0

Not exactly Using PHP to do a regexp match and display the capture

$string = 'aaabbaabbbaaa';

$search = preg_match_all('/b{2}a/',$string,$matches,PREG_SET_ORDER );

echo '<pre>';
var_dump($matches);
echo '</pre>';

$search = preg_match_all('/b{2,}?a/',$string,$matches,PREG_SET_ORDER );

echo '<pre>';
var_dump($matches);
echo '</pre>';

First result gives:

array(2) {
  [0]=>
  array(1) {
    [0]=>
    string(3) "bba"
  }
  [1]=>
  array(1) {
    [0]=>
    string(3) "bba"
  }
}

second gives:

array(2) {
  [0]=>
  array(1) {
    [0]=>
    string(3) "bba"
  }
  [1]=>
  array(1) {
    [0]=>
    string(4) "bbba"
  }
}

With b{2} the capture only returns 2 b's, with b{2,} it returns 2 or more

Mark Baker
  • 199,760
  • 28
  • 325
  • 373
0

x.{2,}?x matches "xasdfx" in "xasdfxbx" but x.{2}x does not match at all.

Without the trailing ?, the first one will match the whole string.

Amarghosh
  • 55,378
  • 11
  • 87
  • 119
0

No, they are different :

.{2,}? : Any character, at least 2 repetitions, as few as possible

.{2} : Any character, exactly 2 repetitions

Thibault Falise
  • 5,485
  • 1
  • 25
  • 32