UPDATED ANSWER according to the comments on this very answer
In fact, what is used is Scanner#findWithHorizon, which in fact calls the Pattern#compile
method with a set of flags (Pattern#compile(String, int)).
The result seems to be applying this pattern over and over again in the input text over lines of a file; and this supposes of course that a pattern cannot match multiple lines at once.
Therefore:
public static final String findInFile(final Path file, final String pattern,
final int flags)
throws IOException
{
final StringBuilder sb = new StringBuilder();
final Pattern p = Pattern.compile(pattern, flags);
String line;
Matcher m;
try (
final BufferedReader br = Files.newBufferedReader(path);
) {
while ((line = br.readLine()) != null) {
m = p.matcher(line);
while (m.find())
sb.append(m.group());
}
}
return sb.toString();
}
For completeness I should add that I have developed some time ago a package which allows a text file of arbitrary length to be read as a CharSequence
and which can be used to great effect here: https://github.com/fge/largetext. It would work beautifully here since a Matcher
matches against a CharSequence
, not a String
. But this package needs some love.
One example returning a List
of matching strings in a file can be:
private static List<String> findLines(final Path path, final String pattern)
throws IOException
{
final Predicate<String> predicate = Pattern.compile(pattern).asPredicate();
try (
final Stream<String> stream = Files.lines(path);
) {
return stream.filter(predicate).collect(Collectors.toList());
}
}