You could use something like:
private static final Pattern p = Pattern
.compile( "(?<!\\d[^a-z\\d]{0,10000})"
+ "\\d([^a-z\\d]*\\d){3}([^a-z\\d]*\\d)?"
+ "(?![^a-z\\d]*\\d)", Pattern.CASE_INSENSITIVE);
public static String replaceSpecial(String text) {
StringBuffer sb = new StringBuffer();
Matcher m = p.matcher(text);
while (m.find()) {
m.appendReplacement(sb, m.group(2) == null ? "****" : "*****");
}
m.appendTail(sb);
return sb.toString();
}
Usage demo:
System.out.println(replaceSpecial("foo 123 56 78 bar 12 32 abc 000_00"));
System.out.println(replaceSpecial("0000"));
System.out.println(replaceSpecial("any text 00 00 more texts"));
System.out.println(replaceSpecial("any text 000 00 more texts 00"));
System.out.println(replaceSpecial("any text 000 00 more texts 00 00"));
System.out.println(replaceSpecial("any text 00-00 more texts 00_00"));
Result:
foo 123 56 78 bar **** abc *****
****
any text **** more texts
any text ***** more texts 00
any text ***** more texts ****
any text **** more texts ****
Idea/explanation:
We want to find series of digits which have between zero or more non-digit but also non-alphabetic characters (we can represent them via [^\\da-z]
but IMO [^a-z\\d]
looks better so I will use this form). Length of this series is 4 or 5 which we can write as
digit([validSeparator]*digit){3,4} //1 digit + (3 OR 4 digits) => 4 OR 5 digits
but we need to have some way to recognize if we matched 4 or 5 digits because we need to have some way to decide if we want to replace this match with 4 or 5 asterisks.
For this purpose I will try to put 5th digit in separate group and will test if that group is empty. So I will try to create something like dddd(d)?
.
And that how I came up with
"\\d([^a-z\\d]*\\d){3}([^a-z\\d]*\\d)?"
// ^^^^^^^^^^^^^^^ possible 5th digit
Now to need to make sure that our regex will match only dddd(d)
which are not surrounded by any digit from left or right because we don't want to match any of cases like
d ddddd
dddddd
ddddd d
So we need to add tests which will check if before (or after) our match there will be no digit (and valid separator). We can use here negative-look-around mechanisms like
So now all we needed to do is combine these regexes (and make it case insensitive or instead of a-z
use a-zA-Z
)
Pattern p = Pattern.compile( "(?<!\\d[^a-z\\d]{0,10000})"
+ "\\d([^a-z\\d]*\\d){3}([^a-z\\d]*\\d)?"
+ "(?![^a-z\\d]*\\d)", Pattern.CASE_INSENSITIVE);
Rest is simple usage of appendTail
and appendReplacement
methods from Matcher class which will let us decide dynamically what to use as replacement of founded match (I tried to explain it better here: https://stackoverflow.com/a/25081783/1393766)