2

As the title indicates, please, how do I capture unpaired brackets or parentheses with regex, precisely, in java, being new to java. For instance, supposing I have the string below;

Programming is productive, (achieving a lot, and getting good results), it is often 1) demanding and 2) costly.

How do I capture 1) and 2). I have tried:

([^\(\)][\)])

But, the result I am getting includes s) as below, instead of 1) and 2):

s), 1) and 2)

I have checked the link: Regular expression to match balanced parentheses, but, the question seem to be referring to recursive or nested structures, which is quite different from my situation. My situation is to match the right parenthesis or right bracket, along with any associated text that does not have an associated left parenthesis or bracket.

  • 2
    Possible duplicate of [Regular expression to match balanced parentheses](https://stackoverflow.com/questions/546433/regular-expression-to-match-balanced-parentheses). Regex isn't the right tool for this job but it can be done with recursive regexes--there's a Java section in the third most voted answer. – ggorlen Oct 25 '19 at 01:33
  • 2
    I don't think the OP wants to match balanced parentheses. They want to match the right parenthesis or right bracket, along with any associated text that does not have an associated left parenthesis or bracket. – WJS Oct 25 '19 at 01:50
  • @WJS, exactly the situation or desired results. –  Oct 25 '19 at 02:37
  • 1
    @DerickMarfo I have a solution but it is not `regex` so I didn't post it. It uses a simple stack and keeps track of the parens. – WJS Oct 25 '19 at 02:53
  • @WJS No problem my friend, I most greatful, since you helped me explained my situation better. I am most appreciative to your effort. Meanwhile, Emma's answer delivers the solution. So, may I know the performance between your solution and that of Emma's? –  Oct 25 '19 at 03:18
  • I simply don't know. Hers is more concise though. – WJS Oct 25 '19 at 03:19
  • 1
    @ggorlen Sorry for late reply, but, for an arbitrarily nested parentheses, I believe below would have done the job for full outer group of arbitrarily nested parentheses: `(?=\()(?:(?=.*?\((?!.*?\1)(.*\)(?!.*\2).*))(?=.*?\)(?!.*?\2)(.*)).)+?.*?(?=\1)[^(]*(?=\2$)` and the following to match inner groups too: `(?=\()(?=((?:(?=.*?\((?!.*?\2)(.*\)(?!.*\3).*))(?=.*?\)(?!.*?\3)(.*)).)+?.*?(?=\2)[^(]*(?=\3$)))`. Borrowed from this blog: [http://www.drregex.com/2017/11/match-nested-brackets-with-regex-new.html] –  Oct 25 '19 at 09:33

2 Answers2

2

Maybe,

\b\d+\)

might simply return the desired output, I guess.

Demo 1

Another way is to see what left boundary you might have, which in this case, I see digits, then what other chars we'd have prior to the closing curly bracket, and then we can design some other simple expression similar to:

\b\d[^)]*\) 

Demo 2

Test

import java.util.regex.Matcher;
import java.util.regex.Pattern;


public class RegularExpression{

    public static void main(String[] args){

        final String regex = "\\b\\d[^)]*\\)";
        final String string = "Programming is productive, (achieving a lot, and getting good results), it is often 1) demanding and 2) costly.\n\n"
             + "Programming is productive, (achieving a lot, and getting good results), it is often 1a b) demanding and 2a a) costly.\n\n\n"
             + "Programming is productive, (achieving a lot, and getting good results), it is often 1b) demanding and 2b) costly.\n\n"
             + "It is not supposed to match ( s s 1) \n";

        final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
        final Matcher matcher = pattern.matcher(string);

        while (matcher.find()) {
            System.out.println("Full match: " + matcher.group(0));
            for (int i = 1; i <= matcher.groupCount(); i++) {
                System.out.println("Group " + i + ": " + matcher.group(i));
            }
        }


    }
}

Output

Full match: 1)
Full match: 2)
Full match: 1a b)
Full match: 2a a)
Full match: 1b)
Full match: 2b)
Full match: 1)

RegEx Circuit

jex.im visualizes regular expressions:

enter image description here

Emma
  • 1
  • 9
  • 28
  • 53
  • 2
    It is not supposed to match `( s s 1)` because the right paren has a mate. – WJS Oct 25 '19 at 02:51
  • 1
    @Emma Thank you!!! You are such a genius. I am so so greatful. You have really been so helpful, because, I have googled for the past hours without finding my solution. I believe you have saved many like me! Thanks once again amazing programmer! –  Oct 25 '19 at 03:24
2

This is not a regex solution (obviously) but I can't think of a good way to do it. This simply uses a stack to keep track of parens.

For the input String "(*(**)**) first) second) (**) (*ksks*) third) ** fourth)( **)

It prints out

first)
second)
third)
fourth)

All other parentheses are ignored because they are matched.

      String s =
            "(*(**)**) first) second) (**) (*ksks*) third) ** fourth)( **)";
      Pattern p;
      List<String> found = new ArrayList<>();
      Stack<Character> tokens = new Stack<>();
      int pcount = 0;

      for (char c : s.toCharArray()) {
         switch (c) {
            case ' ':
               tokens.clear();
               break;
            case '(':
               pcount++;
               break;
            case ')':
               pcount--;
               if (pcount == -1) {
                  String v = ")";
                  while (!tokens.isEmpty()) {
                     v = tokens.pop() + v;
                  }
                  found.add(v);
                  pcount = 0;
               }
               break;
            default:
               tokens.push(c);
         }
      }
      found.forEach(System.out::println);

Note: Integrating brackets (]) into the above would be a challenge (though not impossible) because one would need to check constructs like ( [ ) ] where it is unclear how to interpret it. That's why when specifying requirements of this sort they need to be spelled out precisely.

WJS
  • 22,083
  • 3
  • 14
  • 32
  • 1
    This is very much the right idea. However, this only detects unmatched parens on the first level of recursion. `"(*(**)first) **)"` and we get the wrong output because we have no idea that the intent was to match `first)`. OP's example is a trivial case and I think more clarification is in order from OP about what we need to detect (i.e. how do we distinguish an unmatched nested group? by digits?). – ggorlen Oct 25 '19 at 15:22
  • @ggorlen All I want to achieve is to capture unbalanced parentheses, but not the balanced set. So, 1), a), e), should be captured, but, (3 and 4), (2,8), (hello world), etc. should not be captured if available in the same string. The half brackets will not necessarily be nested. –  Oct 25 '19 at 23:07
  • @ggorlen Thank you in the first place for the reply, however, per your example, `"(foo 1) bar)"`, I will only need `"bar)"`, and not `"1)"`. So, I believe now you've got my point. –  Oct 25 '19 at 23:36
  • OK, then this is the best answer. Thanks for the clarification. – ggorlen Oct 25 '19 at 23:48
  • @ggorlen Thank you too for your understanding –  Oct 25 '19 at 23:57