preg_split()
is a very sensible and direct call since you desire an indexed array containing the two substrings.
Code: (Demo)
$input = 'AK-747';
var_export(preg_split('/[a-z]{2,}\K-?/i',$input));
Output:
array (
0 => 'AK',
1 => '747',
)
The \K
means "restart the fullstring match". Effectively, everything to the left of \K
is retained as the first element in the result array and everything to right (the optional hyphen) is omitted because it is considered the delimiter. Pattern Demo
Code: (Demo)
I process a small battery of inputs to show what can be done and explain after the snippet.
$inputs=['AK747','AK-747','AK-','AK']; // variations as I understand them
foreach($inputs as $input){
echo "$input returns: ";
var_export(preg_split('/[a-z]{2,}\K-?/i',$input,2,PREG_SPLIT_NO_EMPTY));
echo "\n";
}
Output:
AK747 returns: array (
0 => 'AK',
1 => '747',
)
AK-747 returns: array (
0 => 'AK',
1 => '747',
)
AK- returns: array (
0 => 'AK',
)
AK returns: array (
0 => 'AK',
)
preg_split() takes a pattern that receives a pattern that will match a variable substring and use it as a delimiter. If -
were present in every input string then explode('-',$input)
would be most appropriate. However, -
is optional in this task, so the pattern must allow -
to be optional (this is what the ?
quantifier does in all of the patterns on this page).
Now, you couldn't just use a pattern like /-?/
, that would split the string on every character. To overcome this, you need to tell the regex engine the exact expected location for the optional -
. You do this by referencing [a-z]{2,}
before the -?
(single intended delimiter).
The pattern /[a-z]{2,}-?/i
does a fair job of finding the correct location for the optional hyphen, but now the trouble is, the leading letters in the string are included as part of the delimiting substring.
Sometimes, "lookarounds" can be used in regex patterns to match but not consume substrings. A "positive lookbehind" is used to match a preceding substring, however "variable length lookbehinds" are not permitted in php (and most other regex flavors). This is what the invalid pattern would look like: /(?<=[a-z]{2,})-?/i
.
The way around this technicality is to "restart the fullstring match" using the \K token (aka a lookbehind alternative) just before the optional hyphen. To correctly target only the intended delimiter, the leading letters must be "matched/consumed" then "discarded" -- that's what \K
does.
As for the inclusion of the 3rd and 4th parameter of preg_split()
...
- I've set the 3rd parameter to
2
. This is just like the limit
parameter that explode()
has. It instructs the function to not make more than 2 output elements. For this case, I could have used NULL
or -1
to mean "unlimited", but I could NOT leave the parameter empty -- it must be assigned to allow for the declaration of the 4th parameter.
- I've set the 4th parameter to
PREG_SPLIT_NO_EMPTY
which instructs the function to not generate empty output elements.
Ta-Da!
p.s. a preg_match_all()
solution is as easy as using a pipe and two anchors:
$inputs=['AK747','AK-747','AK-','AK']; // variations as I understand them
foreach($inputs as $input){
echo "$input returns: ";
var_export(preg_match_all('/^[a-z]{2,}|\d{2,}$/i',$input,$out)?$out[0]:[]);
echo "\n";
}
// same outputs as above