This should work:
$words = preg_split("/(?<=\w)\b\s*[!?.]*/", 'is is.', -1, PREG_SPLIT_NO_EMPTY);
echo '<pre>';
print_r($words);
echo '</pre>';
The output would be:
Array
(
[0] => is
[1] => is
)
Before I explain the regex, just an explanation on PREG_SPLIT_NO_EMPTY
. That basically means only return the results of preg_split
if the results are not empty. This assures you the data returned in the array $words
truly has data in it and not just empty values which can happen when dealing with regex patterns and mixed data sources.
And the explanation of that regex can be broken down like this using this tool:
NODE EXPLANATION
--------------------------------------------------------------------------------
(?<= look behind to see if there is:
--------------------------------------------------------------------------------
\w word characters (a-z, A-Z, 0-9, _)
--------------------------------------------------------------------------------
) end of look-behind
--------------------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
[!?.]* any character of: '!', '?', '.' (0 or more
times (matching the most amount possible))
An nicer explanation can be found by entering the full regex pattern of /(?<=\w)\b\s*[!?.]*/
in this other other tool:
(?<=\w)
Positive Lookbehind - Assert that the regex below can be matched
\w
match any word character [a-zA-Z0-9_]
\b
assert position at a word boundary (^\w|\w$|\W\w|\w\W)
\s*
match any white space character [\r\n\t\f ]
- Quantifier: Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
!?.
a single character in the list !?.
literally
That last regex explanation can be boiled down by a human—also known as me—as the following:
Match—and split—any word character that comes before a word boundary that can have multiple spaces and the punctuation marks of !?.
.