0

I want to know if there is a difference in these two regular expressions:

Pattern.compile("\"title\":\"(.*?)\"");
Pattern.compile("\"title\":\".*\"");

The part (.*?) and .* looks like they have the same meaning...

Here I get exactly the same results:

        String title = null;
        Pattern p = Pattern.compile("\"title\":\"(.*?)\"");
        //Pattern p = Pattern.compile("\"title\":\".*\"");
        Matcher m = p.matcher("sdfssdfsdfsdfsdf\"title\":\"Here is the title\"sdfgdfgdfgdfgdfg");
        if (m.find()) {
            title = m.group();
        }
        System.out.println(title);

Output:

"title":"Here is the title"

If I do not use parentheses - I'm still able to find separate groups like that:

Pattern p = Pattern.compile("\"title\":\".*?\"");
Matcher m = p.matcher("sdfssdfsdfsdfsdf\"title\":\"Here is the title\"dfdfgrt\"title\":\"Here is the title\"");
while (m.find()) {
    System.out.println(m.group());
}

The output:

"title":"Here is the title"
"title":"Here is the title"

So - do I really need parentheses here?

Peter Mortensen
  • 28,342
  • 21
  • 95
  • 123
Ernestas Gruodis
  • 7,563
  • 12
  • 45
  • 104

1 Answers1

2

There are two things here:

() --> Specifies a capturing group. So, if you want to capture something and want to refer to it later, you can use (what you want to capture here). Without the braces, you don't capture the data.

.* --> is greedy i.e, it tries to grab the entire string and goes one character less and tries to match again.

.*? --> is lazy (AKA reluctant) i.e, starts from length 0 and tries matching the string and stops at first match.

You could look at the official documentation here.

Try this code for matching without capturing

    Pattern p = Pattern.compile("abc(.*?)");
    Matcher m = p.matcher("abc");
    while (m.find()) {
        System.out.println("hi");
        System.out.println("group1: " +  m.group(1));
    }

Output:

hi
group1:
Peter Mortensen
  • 28,342
  • 21
  • 95
  • 123
TheLostMind
  • 34,842
  • 11
  • 64
  • 97
  • Somehow I found (EDIT2 above) that in my situation parentheses are not needed.. Am I right? – Ernestas Gruodis Mar 16 '15 at 11:41
  • @ErnestasGruodis - Well, `m.group()` returns the *entire matched String*. Not the captured group. Use parenthesis and try `m.group(1)` and see what happens :P (hint : Don't try this in production code :P) – TheLostMind Mar 16 '15 at 12:10
  • Question is answered 100%. But why not in production code (I mean there is a situation)? – Ernestas Gruodis Mar 16 '15 at 12:26
  • @ErnestasGruodis - Be careful to make sure that *there is a group* else `group(1)` will give you error :) . Note that `group()` is same as `group(0)`. *Actual* groups start from `1` . Note : `group()` can be used freely . – TheLostMind Mar 16 '15 at 12:28
  • But if `m.find()` is `true` - doesn't that mean that group 1 exists (talking about the same regex with parentheses)? – Ernestas Gruodis Mar 16 '15 at 12:39
  • @ErnestasGruodis - No. `find()` tries to do a match and returns on the first match. It doesn't mean that there is a captured group. It means that there is a match. In your *edit-2* change `group()` to `group(1)` and see what I mean :P – TheLostMind Mar 16 '15 at 12:46
  • OK, but this group `(.*?)` will always be captured :)? Even there is no symbols.. Also `m.groupCount()` would help think. – Ernestas Gruodis Mar 16 '15 at 12:59
  • @ErnestasGruodis - There are 2 things here. 1. `find()` will return false if what you are looking for is not found in the string. 2. if a match is found then `(.*?)` will print `""` if nothing has been captured.. Check my edit :) – TheLostMind Mar 16 '15 at 13:00
  • At least it will not throw an exception. That is OK, thanks indeed for help! – Ernestas Gruodis Mar 16 '15 at 13:06
  • @ErnestasGruodis - Remove the *braces* and it will :P.. You are welcome :) – TheLostMind Mar 16 '15 at 13:07