4

I try to extract the error number from strings like "Wrong parameters - Error 1356":

 Pattern p = Pattern.compile("(\\d*)");
 Matcher m = p.matcher(myString);
 m.find();
 System.out.println(m.group(1));

And this does not print anything, that became strange for me as the * means * - Matches the preceding element zero or more times from Wiki

I also went to the www.regexr.com and regex101.com and test it and the result was the same, nothing for this expression \d*

Then I start to test some different things (all tests made on the sites I mentioned):

  • (\d)* doesn't work
  • \d{0,} doesn't work
  • [\d]* doesn't work
  • [0-9]* doesn't work
  • \d{4} works
  • \d+ works
  • (\d+) works
  • [0-9]+ works

So, I start to search on the web if I could find an explanation for this. The best I could find was here on the Quantifier section, which states:

\d? Optional digit (one or none).
\d* Eat as many digits as possible (but none if necessary)
\d+ Eat as many digits as possible, but at least one.
\d*? Eat as few digits as necessary (possibly none) to return a match.
\d+? Eat as few digits as necessary (but at least one) to return a match.

The question

As english is not my primary language I'm having trouble to understand the difference (mainly the (but none if necessary) part). So could you Regex expert guys explain this in simple words please?

The closest thing that I find to this question here on SO was this one: Regex: possessive quantifier for the star repetition operator, i.e. \d** but here it is not explained the difference.

Micha Wiedenmann
  • 17,330
  • 20
  • 79
  • 123
Jorge Campos
  • 20,662
  • 7
  • 51
  • 77
  • possible duplicate of [Reference - What does this regex mean?](http://stackoverflow.com/questions/22937618/reference-what-does-this-regex-mean) – HamZa Jun 30 '14 at 14:31
  • I will clear your confusion: `\d*` will match empty string. Try to add the `g` modifier to match all. It should also match the digits. [demo](http://regex101.com/r/qQ3aY8/1) – HamZa Jun 30 '14 at 14:34
  • edit: It indeed matches the empty string. – Leif Jun 30 '14 at 14:36
  • 1
    @HamZa I also saw that answer and I did not consider it a duplicate since it was the same references that I find online and wasn't able to get a proper explanation for my problem. The answer provided by Frank Schmitt clarifies everithing just with this phrase: `matches at the start of the input`. That was the thing that I wasn't considering. – Jorge Campos Jun 30 '14 at 14:59
  • 1
    a little offtopic: To come back to your problem its java specific -> m.group(1) works with groups in patterns "(...)" that is why you could not proceed there :D it also needs the m.group() which depends on m.matches() and that one needs to get just the number or it fails =) Use m.find() and then take your input from m.start() to m.end() but keep in mind that the index will be x-1 where x is what m.start/m.end will return – Neso Jun 30 '14 at 15:00
  • Thank you @Neso that is a valuable information. – Jorge Campos Jun 30 '14 at 15:26

5 Answers5

5

The * quantifier matches zero or more occurences.

In practice, this means that

\d*

will match every possible input, including the empty string. So your regex matches at the start of the input string and returns the empty string.

Frank Schmitt
  • 27,865
  • 9
  • 65
  • 100
3

but none if necessary means that it will not break the regex pattern if there is no match. So \d* means it will match zero or more occurrences of digits.

For eg.

\d*[a-z]*

will match

abcdef

but \d+[a-z]*

will not match

abcdef

because \d+ implies that at least one digit is required.

Wes
  • 764
  • 1
  • 4
  • 17
0
\d* Eat as many digits as possible (but none if necessary)

\d* means it matches a digit zero or more times. In your input, it matches the least possible one (ie, zero times of the digit). So it prints none.

\d+

It matches a digit one or more times. So it should find and match a digit or a digit followed by more digits.

Avinash Raj
  • 160,498
  • 22
  • 182
  • 229
0

With the pattern /d+ at least one digit will need to be reached, and then the match will return all subsequent characters until a non-digit character is reached.

/d* will match all the empty strings (zero or more), as well at the match. The .Net Regex parser will return all these empty string groups in its set of matches.

Ananke
  • 1,150
  • 7
  • 11
0

Simply:

\d* implies zero or more times

\d+ means one or more times