I would like to split a string in word boundaries and hence for now I am considering that whitespace, a ',' and a '.' or '!' signify the boundaries of words.
In the following example:
String text = "This is, just a text to be used, for testing purpose. Nothing more!";
String[] words = text.split("[\\s+,.!]");
for(String w: words) {
System.out.println(w);
}
This prints:
This
is
just
a
text
to
be
used
for
testing
purpose
Nothing
more
As you can see there are empty words for the words that ended with ,
or .
or !
But if I add a +
in my regex:
String[] words = text.split("[\\s+,.!]+");
for(String w: words) {
System.out.println(w);
}
This
is
just
a
text
to
be
used
for
testing
purpose
Nothing
more
The empty words are not there. Why is that +
required so that I avoid the empty words?