-1

I was not sure how to phrase the title of this question, but imagine i have the following string:

(1,2),(3,(4))

I want a regex that allows me to get 1,2 and 3,(4)

The regex I currently have is \\(([^)]*)\\).

The problem with this regex is that it gets me 1,2 and 3,(4. This happens because it is matching the parenthesis of (4). But I need a regex that ignores the 4's parenthesis and only matches the outside ones, if that makes sense.

EDIT: to give more insight on the problem, this is the exact kind of string I expect: (STRING1,STRING2),(STRING3,STRING4) where STRING is a string of random characters (it can have letters, numbers, whitespaces, other random symbols). Since it has random characters, the strings themselves can have parenthesis, which will trigger the regex. I basically need to keep track of the every parenthesis found that I can ignore close parenthesis that match other open parenthesis.

Tiago Silva
  • 103
  • 1
  • 10
  • You say you need to get `1,2` and `3,(4)` but you also want to ignore the 4's parenthesis? – Cagri Dec 14 '20 at 16:51
  • Regex is not good at matching arbitrarily nested grammars like that. It will be easier to parse that using your own code, or perhaps a parser library. – Hulk Dec 14 '20 at 16:51
  • This example is too specific. Please give more examples. Also, could there be other characters in the string that you want to ignore? `(1,2), (3,4)) xyz`. What about white spaces, etc? – jrook Dec 14 '20 at 16:52
  • I added more details to my problem. Hopefully it is more clear now. I think @Hulk has gotten the gist of it, but I am not sure how I can implement this. – Tiago Silva Dec 14 '20 at 16:58
  • @Hulk but I am interested in what is inside. I want exactly what I show in the example – Tiago Silva Dec 14 '20 at 17:04

1 Answers1

0

Regex is probably not the appropriate tool for this job. Instead, an iterative approach can match such things in a single pass, without resorting to recursion etc.

Note: In the meantime, this question has been (correctly) closed as a duplicate. The following is a somewhat naive and certainly not perfect java implementation of an algorithm very similar to the one proposed in the this answer linked to from the accepted answer of the linked duplicate.

If you are only interested in the outermost pairs of matching parantheses, something like this may do:

public static void main(String[] args) {                    

    String s = "(1,2),(3,(4))";
    
    ArrayList<String> matches = new ArrayList<>();
    
    int depth = 0;
    int startIndex = 0;
    int i = 0;
    
    for (char c : s.toCharArray()) {
        if (c == '(') {
            if (depth == 0) {
                startIndex = i;
            }
            depth++;
        }
        if (c == ')') {
            depth--;
            if (depth == 0) {
                matches.add(s.substring(startIndex+1, i));
            }
        }
        i++;
    }
    System.out.println(matches.size());
    System.out.println(matches);
}

Note that this code does not handle cases where the input contains more closing than opening parantheses - the depth just gets negative. You may want to add sytax-error handling for these cases (and I'm sure there are others).

Hulk
  • 5,295
  • 1
  • 25
  • 49