I've identified some unexpected behavior in Java's regex implementation. When using java.util.regex.Pattern
and java.util.regex.Matcher
, the following regular expression does not match correctly on the input "Merlot"
when using Matcher's find()
method:
((?:White )?Zinfandel|Merlot)
If I change the order of the expressions inside the outermost matching group, Matcher's find()
method does match.
(Merlot|(?:White )?Zinfandel)
Here is some test code that illustrates the problem.
RegexTest.java
import java.util.regex.*;
public class RegexTest {
public static void main(String[] args) {
Pattern pattern1 = Pattern.compile("((?:White )?Zinfandel|Merlot)");
Matcher matcher1 = pattern1.matcher("Merlot");
// prints "No Match :("
if (matcher1.find()) {
System.out.println(matcher1.group(0));
} else {
System.out.println("No match :(");
}
Pattern pattern2 = Pattern.compile("(Merlot|(?:White )?Zinfandel)");
Matcher matcher2 = pattern2.matcher("Merlot");
// prints "Merlot"
if (matcher2.find()) {
System.out.println(matcher2.group(0));
} else {
System.out.println("No match :(");
}
}
}
The expected output is:
Merlot
Merlot
But the actual output is:
No Match :(
Merlot
I've verified this unexpected behavior exists in Java version 1.7.0_11 on Ubuntu linux and also Java version 1.6.0_37 on OSX 10.8.2. I reported this behavior as a bug to Oracle yesterday and got back an automated email telling me my bug report has been received and has an internal review ID of 2441589. I can't find my bug report when I search for that id in their bug database. (Can you hear the crickets?)
Have I uncovered a bug in Java's presumably thoroughly tested and used regex implementation (hard to believe in 2013), or am I doing something wrong?