0

Using java.util.regex (jdk 1.6), the regular expression 201210(\d{5,5})Test applied to the subject string 20121000002Test only captures group(0) and does not capture group(1) (the pattern 00002) as it should, given the code below:

Pattern p1 = Pattern.compile("201210(\\d{5,5})Test");
Matcher m1 = p1.matcher("20121000002Test");

if(m1.find()){

    for(int i = 1; i<m1.groupCount(); i++){         
    System.out.println("number = "+m1.group(i));            
    }
}

Curiously, another similar regular expression like 201210(\d{5,5})Test(\d{1,10}) applied to the subject string 20121000002Test0000000099 captures group 0 and 1 but not group 2.

On the contrary, by using JavaScript's RegExp object, the exact same regular expressions applied to the exact same subject strings captures all groups, as one could expect. I checked and re-checked this fact on my own by using these online testers:

Am I doing something wrong here? Or is it that Java's regex library really sucks?

Alan Moore
  • 68,531
  • 11
  • 88
  • 149
  • 2
    If you add `/` at the beginning and end of your RegExp in JavaScript, it returns a single group. Are you really sure about what you're saying? Have you prepared your own JavaScript test for this (i.e. **no** online editors)? – Luiggi Mendoza Oct 20 '12 at 15:39

5 Answers5

1

Change the line

for(int i = 1; i<m1.groupCount(); i++){     

to

for(int i = 1; i<=m1.groupCount(); i++){      //NOTE THE = ADDED HERE    

It now works as a charm!

Haozhun
  • 5,663
  • 3
  • 26
  • 47
  • 1
    Thank you folks for all the answers! I just cannot believe it. It never crossed my mind that groupCount() would not include group 0, unlike javascript's Regexp exec(). It does not make much sense for me because, after all, group 0 is a damn group! Anyway, I guess I should've debuged the code in more depth... –  Oct 20 '12 at 23:00
1

m1.groupCount() returns the number of capturing groups, ie. 1 in your first case so you won't enter in this loop for(int i = 1; i<m1.groupCount(); i++)

It should be for(int i = 1; i<=m1.groupCount(); i++)

Newbo.O
  • 1,918
  • 1
  • 12
  • 14
  • Thank you folks for all the answers! I just cannot believe it. It never crossed my mind that groupCount() would not include group 0, unlike javascript's Regexp exec(). It does not make much sense for me because, after all, group 0 is a damn group! Anyway, I guess I should've debuged the code in more depth... –  Oct 21 '12 at 01:19
0

the regular expression "201210(\d{5,5})Test" applied to the subject string "20121000002Test" only captures group(0) and does not capture group(1)

Well I can say I didn't read the manual either but if you do it says for Matcher.groupCount()

Returns the number of capturing groups in this matcher's pattern. Group zero denotes the entire pattern by convention. It is not included in this count.

Peter Lawrey
  • 498,481
  • 72
  • 700
  • 1,075
0
for (int i = 1; i <= m1.groupCount(); i++) { 
                   ↑
              your problem
Ωmega
  • 37,727
  • 29
  • 115
  • 183
0

From java.util.regex.MatchResult.groupCount:

Group zero denotes the entire pattern by convention. It is not included in this count.

So iterate through groupCount() + 1.

Jeff Bowman
  • 74,544
  • 12
  • 183
  • 213
  • 1
    No, it's just `groupCount()`. The problem is that he's only going up to `groupCount() - 1` now. – Alan Moore Oct 20 '12 at 16:42
  • @Alan Same thing. It should be while `i < groupCount() + 1` or while `i <= groupCount()`. Arguing correctness beyond that is silly. (I favor the former because `<=` is easy to miss in loop conditions.) – Jeff Bowman Oct 20 '12 at 17:02