1

I want to extract the string content inside square brackets (if inside one square brackets contains nested square brackets, it should be ignored).

Example:

c[ts[0],99:99,99:99] + 5 - d[ts[1],99:99,99:99, ts[2]] + 5

Should return:

 match1 = "ts[0],99:99,99:99";
 match2 = "ts[1],99:99,99:99, ts[2]";

The code I have so far works only with non-nested square brackets

String in = "c[ts[0],99:99,99:99] + 5 - d[ts[1],99:99,99:99, ts[2]] + 5";

Pattern p = Pattern.compile("\\[(.*?)\\]");
Matcher m = p.matcher(in);

while(m.find()) {
    System.out.println(m.group(1));
}

// print: ts[0, ts[1, 2
Casimir et Hippolyte
  • 83,228
  • 5
  • 85
  • 113
Bằng Rikimaru
  • 1,193
  • 2
  • 15
  • 42
  • In should ignore when there are nested brackets but in `Should return` there are nested brackets of the outer ones? Can you add an example that should not be returned? – The fourth bird May 17 '19 at 13:45

4 Answers4

2

I made a function to do it (not with regex, but it works)

  for (int i = 0; i < in.length(); i++){
        char c = in.charAt(i);
        String part = String.valueOf(c);
        int numberOfOpenBrackets = 0;
        if (c == '[') {
            part = "";
            numberOfOpenBrackets++;
            for (int j = i + 1; j < in.length(); j++) {
                char d = in.charAt(j);
                if (d == '[') {
                    numberOfOpenBrackets++;
                }
                if (d == ']') {
                    numberOfOpenBrackets--;
                    i = j;
                    if (numberOfOpenBrackets == 0) {
                        break;
                    }
                }
                part += d;
            }

            System.out.println(part);
            part = "[" + part + "]";
        }

        result += part;
    }

    // print: ts[0],99:99,99:99
    //        ts[1],99:99,99:99, ts[2]
Bằng Rikimaru
  • 1,193
  • 2
  • 15
  • 42
2

If the nesting is just one level, you can search for a sequence between the brackets:

  • a sequence of:
  • either a not a [
  • or a [ followed by the shortest sequence to ]

So

Pattern p = Pattern.compile("\\[([^\\[]|\\[.*?\\])*\\]");
//                             [                   ]
//                              ( not-[ or
//                                        [, shortest sequence to ]
//                                               )* repeatedly

The problem being that brackets must be correctly paired: no syntax errors allowed.

Joop Eggen
  • 96,344
  • 7
  • 73
  • 121
1

Without regex; just straight java:

import java.util.ArrayList;
import java.util.List;

public class BracketParser {

    public static List<String> parse(String target) throws Exception {
        List<String> results = new ArrayList<>();
        for (int idx = 0; idx < target.length(); idx++) {
            if (target.charAt(idx) == '[') {
                String result = readResult(target, idx + 1);
                if (result == null) throw new Exception();
                results.add(result);
                idx += result.length() + 1;
            }
        }
        return results;
    }

    private static String readResult(String target, int startIdx) {
        int openBrackets = 0;
        for (int idx = startIdx; idx < target.length(); idx++) {
            char c = target.charAt(idx);
            if (openBrackets == 0 && c == ']')
                return target.substring(startIdx, idx); 
            if (c == '[') openBrackets++;
            if (c == ']') openBrackets--;
        }
        return null;
    }

    public static void main(String[] args) throws Exception {
        System.out.println(parse("c[ts[0],99:99,99:99] + 5 - d[ts[1],99:99,99:99, ts[2]] + 5"));
    }
}

Complete code on GitHub

Marco R. - Bopsys LLC
  • 2,484
  • 12
  • 31
0

You might want to add a right boundary in your expression and ts start and swipe everything in between, which might work, maybe similar to this expression:

(ts.*?)(\]\s+\+)

If we have more chars here: (\s\+), you can simply add it with logical ORs in a char list and it would still work.

RegEx

If this wasn't your desired expression, you can modify/change your expressions in regex101.com. enter image description here

RegEx Circuit

You can also visualize your expressions in jex.im:

enter image description here

Emma
  • 1
  • 9
  • 28
  • 53