This is a tricky question, and maybe in the end it has no solution (or not a reasonable one, at least). I'd like to have a Java specific example, but if it can be done, I think I could do it with any example.
My goal is to find a way of knowing whether an string being read from an input stream could still match a given regular expression pattern. Or, in other words, read the stream until we've got a string that definitely will not match such pattern, no matter how much characters you add to it.
A declaration for a minimalist simple method to achieve this could be something like:
boolean couldMatch(CharSequence charsSoFar, Pattern pattern);
Such a method would return true
in case that charsSoFar
could still match pattern if new characters are added, or false
if it has no chance at all to match it even adding new characters.
To put a more concrete example, say we have a pattern for float numbers like "^([+-]?\\d*\\.?\\d*)$"
.
With such a pattern, couldMatch
would return true
for the following example charsSoFar
parameter:
"+"
"-"
"123"
".24"
"-1.04"
And so on and so forth, because you can continue adding digits to all of these, plus one dot also in the three first ones.
On the other hand, all these examples derived from the previous one should return false
:
"+A"
"-B"
"123z"
".24."
"-1.04+"
It's clear at first sight that these will never comply with the aforementioned pattern, no matter how many characters you add to it.
EDIT:
I add my current non-regex approach right now, so to make things more clear.
First, I declare the following functional interface:
public interface Matcher {
/**
* It will return the matching part of "source" if any.
*
* @param source
* @return
*/
CharSequence match(CharSequence source);
}
Then, the previous function would be redefined as:
boolean couldMatch(CharSequence charsSoFar, Matcher matcher);
And a (drafted) matcher for floats could look like (note this does not support the + sign at the start, just the -):
public class FloatMatcher implements Matcher {
@Override
public CharSequence match(CharSequence source) {
StringBuilder rtn = new StringBuilder();
if (source.length() == 0)
return "";
if ("0123456789-.".indexOf(source.charAt(0)) != -1 ) {
rtn.append(source.charAt(0));
}
boolean gotDot = false;
for (int i = 1; i < source.length(); i++) {
if (gotDot) {
if ("0123456789".indexOf(source.charAt(i)) != -1) {
rtn.append(source.charAt(i));
} else
return rtn.toString();
} else if (".0123456789".indexOf(source.charAt(i)) != -1) {
rtn.append(source.charAt(i));
if (source.charAt(i) == '.')
gotDot = true;
} else {
return rtn.toString();
}
}
return rtn.toString();
}
}
Inside the omitted body for the couldMatch method, it will just call matcher.match() iteratively with a new character added at the end of the source parameter and return true while the returned CharSequence is equal to the source parameter, and false as soon as it's different (meaning that the last char added broke the match).