My question is quite simple yet puzzling. It could be that there is a simple switch which fixes this but I'm not much experienced in Java regexes...
String line = "";
line.replaceAll("(?i)(.)\\1{2,}", "$1");
This crashes. If I remove the (?i)
switch, it works. The three unicode characters are not random, they were found amidst a big Korean text, but I don't know they are valid or not.
Strange thing is that the regex works for all the other text but this. Why do I get the error?
This is the exception I get
Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: 6
at java.lang.String.charAt(String.java:658)
at java.lang.Character.codePointAt(Character.java:4668)
at java.util.regex.Pattern$CIBackRef.match(Pattern.java:4846)
at java.util.regex.Pattern$Curly.match(Pattern.java:4125)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4615)
at java.util.regex.Pattern$CharProperty.match(Pattern.java:3694)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4556)
at java.util.regex.Pattern$Start.match(Pattern.java:3408)
at java.util.regex.Matcher.search(Matcher.java:1199)
at java.util.regex.Matcher.find(Matcher.java:592)
at java.util.regex.Matcher.replaceAll(Matcher.java:902)
at java.lang.String.replaceAll(String.java:2162)
at tokenizer.Test.main(Test.java:51)