-1

Suppose that I want to build a very large regex with capture groups on run-time based on user's decisions.

Simple example:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Test {    
    static boolean findTag, findWordA, findOtherWord, findWordX;

    static final String TAG = "(<[^>]+>)";
    static final String WORD_A = "(wordA)";
    static final String OTHER_WORD = "(anotherword)";
    static final String WORD_X = "(wordX)";

    static int tagCount = 0;
    static int wordACount = 0;
    static int otherWordCount = 0;
    static int wordXCount = 0;

    public static void main(String[] args) {
        // Boolean options that will be supplied by the user
        // make them all true in this example
        findTag = true;
        findWordA = true;
        findOtherWord = true;
        findWordX = true;

        String input = "<b>this is an <i>input</i> string that contains wordX, wordX, anotherword and wordA</b>";

        StringBuilder regex = new StringBuilder();

        if (findTag)
            regex.append(TAG + "|");

        if (findWordA)
            regex.append(WORD_A + "|");

        if (findOtherWord)
            regex.append(OTHER_WORD + "|");

        if (findWordX)
            regex.append(WORD_X + "|");

        if (regex.length() > 0) {
            regex.setLength(regex.length() - 1);
            Pattern pattern = Pattern.compile(regex.toString());

            System.out.println("\nWHOLE REGEX: " + regex.toString());
            System.out.println("\nINPUT STRING: " + input);

            Matcher matcher = pattern.matcher(input);

            while (matcher.find()) {
                // only way I know of to find out which group was matched:
                if (matcher.group(1) != null) tagCount++;
                if (matcher.group(2) != null) wordACount++;
                if (matcher.group(3) != null) otherWordCount++;
                if (matcher.group(4) != null) wordXCount++;
            }

            System.out.println();
            System.out.println("Group1 matches: " + tagCount);
            System.out.println("Group2 matches: " + wordACount);
            System.out.println("Group3 matches: " + otherWordCount);
            System.out.println("Group4 matches: " + wordXCount);

        } else {
            System.out.println("No regex to build.");
        }
    }
}

The problem is that I can only count each group's matches only when I know beforehand which regex/groups the user wants to find.

Note that the full regex will contain a lot more capture groups and they will be more complex.

How can I determine which capture group was matched so that I can count each group's occurrences, without knowing beforehand which groups the user wants to find?

Belphegor
  • 3,711
  • 11
  • 34
  • 57
AndroidX
  • 331
  • 2
  • 11
  • maybe an obvious answer, but you do realize you can use `groupCount()` to determine the number of groups? – Patrick Parker Dec 13 '16 at 00:29
  • 1
    Not really related but `StringBuilder` is used precisely to avoid string concatenation (which creates additional `StringBuilder`s for each concatenation expression). So instead of `regex.append(TAG + "|");` use `regex.append(TAG).append("|");`. – Pshemo Dec 13 '16 at 00:35
  • @PatrickParker the number of groups is not enough, I need to know _which_ groups to count. – AndroidX Dec 13 '16 at 00:42
  • @Pshemo thanks, good to know ;) – AndroidX Dec 13 '16 at 00:44

1 Answers1

1

construct the regex to used named groups:

(?<tag>wordA)|(?<wordx>wordX)|(?<anotherword>anotherword)
Community
  • 1
  • 1
Scott Weaver
  • 6,328
  • 2
  • 23
  • 37