188

It is my understanding that the java.regex package does not have support for named groups (http://www.regular-expressions.info/named.html) so can anyone point me towards a third-party library that does?

I've looked at jregex but its last release was in 2002 and it didn't work for me (admittedly I only tried briefly) under java5.

Dan
  • 4,325
  • 2
  • 22
  • 21

6 Answers6

303

(Update: August 2011)

As geofflane mentions in his answer, Java 7 now support named groups.
tchrist points out in the comment that the support is limited.
He details the limitations in his great answer "Java Regex Helper"

Java 7 regex named group support was presented back in September 2010 in Oracle's blog.

In the official release of Java 7, the constructs to support the named capturing group are:

  • (?<name>capturing text) to define a named group "name"
  • \k<name> to backreference a named group "name"
  • ${name} to reference to captured group in Matcher's replacement string
  • Matcher.group(String name) to return the captured input subsequence by the given "named group".

Other alternatives for pre-Java 7 were:


(Original answer: Jan 2009, with the next two links now broken)

You can not refer to named group, unless you code your own version of Regex...

That is precisely what Gorbush2 did in this thread.

Regex2

(limited implementation, as pointed out again by tchrist, as it looks only for ASCII identifiers. tchrist details the limitation as:

only being able to have one named group per same name (which you don’t always have control over!) and not being able to use them for in-regex recursion.

Note: You can find true regex recursion examples in Perl and PCRE regexes, as mentioned in Regexp Power, PCRE specs and Matching Strings with Balanced Parentheses slide)

Example:

String:

"TEST 123"

RegExp:

"(?<login>\\w+) (?<id>\\d+)"

Access

matcher.group(1) ==> TEST
matcher.group("login") ==> TEST
matcher.name(1) ==> login

Replace

matcher.replaceAll("aaaaa_$1_sssss_$2____") ==> aaaaa_TEST_sssss_123____
matcher.replaceAll("aaaaa_${login}_sssss_${id}____") ==> aaaaa_TEST_sssss_123____ 

(extract from the implementation)

public final class Pattern
    implements java.io.Serializable
{
[...]
    /**
     * Parses a group and returns the head node of a set of nodes that process
     * the group. Sometimes a double return system is used where the tail is
     * returned in root.
     */
    private Node group0() {
        boolean capturingGroup = false;
        Node head = null;
        Node tail = null;
        int save = flags;
        root = null;
        int ch = next();
        if (ch == '?') {
            ch = skip();
            switch (ch) {

            case '<':   // (?<xxx)  look behind or group name
                ch = read();
                int start = cursor;
[...]
                // test forGroupName
                int startChar = ch;
                while(ASCII.isWord(ch) && ch != '>') ch=read();
                if(ch == '>'){
                    // valid group name
                    int len = cursor-start;
                    int[] newtemp = new int[2*(len) + 2];
                    //System.arraycopy(temp, start, newtemp, 0, len);
                    StringBuilder name = new StringBuilder();
                    for(int i = start; i< cursor; i++){
                        name.append((char)temp[i-1]);
                    }
                    // create Named group
                    head = createGroup(false);
                    ((GroupTail)root).name = name.toString();

                    capturingGroup = true;
                    tail = root;
                    head.next = expr(tail);
                    break;
                }
VonC
  • 1,042,979
  • 435
  • 3,649
  • 4,283
  • both links above seems to be broken? – Jonas Jan 05 '10 at 04:28
  • This code is buggy. It is looking for ASCII identifiers. That’s wrong. It should be looking for anything that Java allows in an identifier!! – tchrist Aug 11 '11 at 22:41
  • 1
    Just FYI since you seem so conscientious, the limited part isn’t so much about the ASCII vs Unicode names as it is about only being able to have one named group per same name (which you don’t always have control over!) and not being able to use them for in-regex recursion. – tchrist Aug 12 '11 at 05:41
  • @tchrist: thank you for this precision (included). I have also added a link back to your stellar answer on "Java Regex helper" (upvoted). – VonC Aug 12 '11 at 05:54
  • There is no matcher.name(int index) method for Matcher object in Java ?? – alperc Jan 17 '19 at 12:49
  • @ot0 At least not in the JDK I was considering 10 years ago... Nowadays, I wouldn't know. I don't see one in their current up-to-date tutorial: https://docs.oracle.com/javase/tutorial/essential/regex/matcher.html – VonC Jan 17 '19 at 12:57
28

For people coming to this late: Java 7 adds named groups. Matcher.group(String groupName) documentation.

geofflane
  • 2,552
  • 21
  • 21
27

Yes but its messy hacking the sun classes. There is a simpler way:

http://code.google.com/p/named-regexp/

named-regexp is a thin wrapper for the standard JDK regular expressions implementation, with the single purpose of handling named capturing groups in the .net style : (?...).

It can be used with Java 5 and 6 (generics are used).

Java 7 will handle named capturing groups , so this project is not meant to last.

John Hardy
  • 528
  • 5
  • 6
  • 1
    Too bad this cant be used from within GWT. – Sakuraba Dec 04 '09 at 09:58
  • 4
    Check out the [GitHub fork](https://github.com/tony19/named-regexp) of this project, which fixes several bugs from the original. It's also hosted in Maven Central. – tony19 Jul 16 '12 at 22:34
  • 1
    Just a word of caution in my case, the tony19 fork on Github doesn't work on Android as of 0.1.8. – Chuck D Sep 28 '12 at 18:46
  • 2
    @RubberMallet, The Android-specific problem is now [fixed](https://tony19.atlassian.net/browse/REGEX-9) and will be in 0.1.9. – tony19 Dec 04 '12 at 03:53
2

For those running pre-java7, named groups are supported by joni (Java port of the Oniguruma regexp library). Documentation is sparse, but it has worked well for us.
Binaries are available via Maven (http://repository.codehaus.org/org/jruby/joni/joni/).

Ryan Smith
  • 126
  • 1
  • 4
  • I am very interested in the joni option mentioned by Ryan above -- do you have any code snippets using named capture groups - I have managed to get basic matching and searching to work correctly - but I don't see which method I would use to get access to the groupNames or to get a capture's value using the group name. – malsmith Jul 17 '12 at 19:40
2

What kind of problem do you get with jregex? It worked well for me under java5 and java6.

Jregex does the job well (even if the last version is from 2002), unless you want to wait for javaSE 7.

Brian Clozel
  • 46,620
  • 12
  • 129
  • 152
1

A bit old question but I found myself needing this also and that the suggestions above were inaduquate - and as such - developed a thin wrapper myself: https://github.com/hofmeister/MatchIt