I am trying to translate a section of C# code into Java, and while I have familiarity in both, I am not very strong with the regex libraries.
From MSDN, they give this example
String pattern = @"\D+(?<digit>\d+)\D+(?<digit>\d+)?";
And this output (which I see they are using the capture index, and not the group name itself)
Match: abc123def456
Group 1: 456
Capture 0: 123
Capture 1: 456
With this note
a group name can be repeated in a regular expression. For example, it is possible for more than one group to be named digit, as the following example illustrates. In the case of duplicate names, the value of the Group object is determined by the last successful capture in the input string.
So maybe this is a bad example (because my actual code isn't using digits), but anyways...
Translating that into Java, it isn't too happy about the second <digit>
.
String pattern = "\\D+(?<digit>\\d+)\\D+(?<digit>\\d+)?"
Pattern p = Pattern.compile(pattern);
String matchMe = "abc123def456";
And errors at Pattern.compile
with
Named capturing group <digit> is already defined
Removing all but the last name completely would be an option, I guess, seeing as that would "match" the C# behavior.
This problem arises, though, when I am trying to nest patterns within one another like so
String x = "(?<InnerData>...)no group(?<InnerGroup>foo)";
String y = "(?<header>[...])some data" + x + "more regex" + x;
Pattern.compile(y);
where x
is inner content that repeats within y
and it's not something I can stick a repetition modifier onto.
I know it doesn't make sense to have groups of the same name because how would it know what you wanted?
So, question is - what can I do about that?
Is using the Matcher.group(int)
my only option and forego the group names?