0

I'm having some trouble while trying to split a string with a nested separator.

My String would be like "(a,b(1,2,3),c,d(a,b,c))".

How could I get an array ["a","b(1,2,3)","c","d(a,b,c)"] ?

I obviously can't use .split(","), since it would also split my sub-strings.

Zenoo
  • 11,719
  • 4
  • 38
  • 57
  • Does the nesting only occur by using `(...)`? Is there only one level of nesting at max? If not then regex won't be a good fit since it can't deal with arbitrary nesting. – Thomas Jan 24 '18 at 09:30
  • 1
    What have you tried so far? Any RegEx (via https://regex101.com/ ) ? – DamCx Jan 24 '18 at 09:31
  • The nesting only occurs by using `(...)` and it can have an infinite amount of nests. – Zenoo Jan 24 '18 at 09:31
  • @Thomas actually nesting level doesn't really matter here because everything inside the first `()` level seems to be discarded – Kaddath Jan 24 '18 at 09:32
  • @Kaddath no, it does matter. How would you find the matching closing paranthesis for arbitrary nesting depth with a regex? Assume something like `(a,b(c(d(1,2,...),3,4)),5)` (just an arbitrary example which should fit the OP's description. What regex would correctly match the closing paranthesis for the b group when you don't know whether there could be more parantheses in `...`? – Thomas Jan 24 '18 at 09:34
  • @DamCx I've already tried multiple Regex, but I can't really find a solution for me. `([^,]*),` is not working since it stops at the nearest comma. – Zenoo Jan 24 '18 at 09:37
  • Possible duplicate of [Java: splitting a comma-separated string but ignoring commas in quotes](https://stackoverflow.com/questions/1757065/java-splitting-a-comma-separated-string-but-ignoring-commas-in-quotes) – Tom Jan 24 '18 at 09:42
  • try following code. public static void main(String [] a) throws FileNotFoundException, PrintException { String str="(a,b(1,2,3),c,d(a,b,c))"; Pattern p = Pattern.compile("[a-z](\\(.*?\\))?[),]??"); Matcher m = p.matcher(str); while( m.find()) { System.out.println(m.group()); } } – akshaya pandey Jan 24 '18 at 09:54
  • @Thomas you're totally right, just a personal bias, i'm the kind of guy to count brackets to match them even if there's supposed to be just one level.. – Kaddath Jan 24 '18 at 10:17
  • @Kaddath I personally prefer to use regex as well but in some cases they're harder to use or not useable at all - especially with nesting of unknown depth. The problem is not to count brackets but to know/anticipate how many there are and ignore any excess ones (that's the hard part). – Thomas Jan 24 '18 at 10:20

2 Answers2

3

Here is a straight forward non-recursive function that splits your string the way you want:

private String[] specialSplit(String s) {
    List<String> result = new ArrayList<>();
    StringBuilder sb = new StringBuilder();
    int parenCount = 0;
    for (int i = 1; i < s.length() - 1; i++) { // go from 1 to length -1 to discard the surrounding ()
        char c = s.charAt(i);
        if (c == '(') parenCount++;
        else if (c == ')') parenCount--;

        if (parenCount == 0 && c == ',') {
            result.add(sb.toString());
            sb.setLength(0); // clear string builder
        } else {
            sb.append(c);
        }
    }
    result.add(sb.toString());
    return result.toArray(new String[0]);
}

Basically, we iterate through all the characters of the string keeping track of the parentheses. The first and last parentheses are not considered. We only split the string when we have seen the same amount of opening and closing parentheses and when the current character is ','.

This method will likely run much faster than any regex you may find.

gus3001
  • 793
  • 6
  • 17
2

A recursive function should work here, just not with plain split(). Try parsing your string character by character and act whenever you encounter a comma or paranthesis: , means you create a new element, ( you start a new nested list, ) means you finish the current nested list. This should even work with a more "unrolled" approach (i.e. no recursion but handling the nesting in a data structure).

Thomas
  • 80,843
  • 12
  • 111
  • 143
  • Yes, I was thinking about going character by character, but I was hoping for a better, quicker solution. Isn't there any other way ? – Zenoo Jan 24 '18 at 09:42
  • @Zenoo it depends on what your data actually represents, i.e. there might be libraries that do that for you already. If it's just that simple format then it's not that hard to write yourself and looking for a library might be more work that implementing it. – Thomas Jan 24 '18 at 09:44