First, why is your attempted pattern not delivering the desired output? Because the $
anchor tells the function to explode the string by using the final four numbers as the "delimiter" (characters that should be consuming while dividing the string into separate parts).
Your result:
array (
0 => 'ABCDE1234ABCD1234ABCDEF', // an element of characters before the last four digits
1 => '', // an empty element containing the non-existent characters after the four digits
)
In plain English, to fix your pattern, you must:
- Not consume any characters while exploding and
- Ensure that no empty elements are generated.
My snippet is at the bottom of this post.
Second, there seems to be some debate about what regex function to use (or even if regex is a preferrable tool).
- My stance is that using a non-regex method will require a long-winded block of lines which will be equally if not more difficult to read than a regex pattern. Using regex affords you to generate your result in one-line and not in an unsightly fashion. So let's dispose of iterated sets of conditions for this task.
Now the critical concern is whether this task is simply "extracting" data from a consistent and valid string (case "A"), or if it is "validating AND extracting" data from a string (case"B") because the input cannot be 100 trusted to be consistent/correct.
- In case A, you needn't concern yourself with producing valid elements in the output, so
preg_split()
or preg_match_all()
are good candidates.
- In case B,
preg_split()
would not be advisable, because it only hunts for delimiting substrings -- it remains ignorant of all other characters in the string.
Assuming this task is case A, then a decision is still pending about the better function to call. Well, both functions generate an array, but preg_match_all()
creates a multidimensional array while you desire a flat array (like preg_split()
provides). This means you would need to add a new variable to the global scope ($matches
) and append [0]
to the array to access the desired fullstring matches. To someone who doesn't understand regex patterns, this may border on the bad practice of using "magic numbers".
For me, I strive to code for Directness and Accuracy, then Efficiency, then Brevity and Clarity. Since you're not likely to notice any performance drops while performing such a small operation, efficiency isn't terribly important. I just want to make some comparisons to highlight the cost of a pattern that leverages only look-arounds or a pattern that misses an oportunity to greedily match predictable characters.
/(?<=\d{4})(?=[a-z])/i
79 steps (Demo)
~\d{4}\K~
25 steps (Demo)
/[a-z]+[0-9]{4}\K/i
13 steps (Demo)
~\D+[0-9]{4}\K~
13 steps (Demo)
~\D+\d{4}\K~
13 steps (Demo)
FYI, \K
is a metacharacter that means "restart the fullstring match", in other words "forget/release all previously matched characters up to this point". This effectively ensures that no characters are lost while spitting.
Suggested technique: (Demo)
var_export(
preg_split(
'~\D+\d{4}\K~', // pattern
'ABCDE1234ABCD1234ABCDEF1234', // input
0, // make unlimited explosions
PREG_SPLIT_NO_EMPTY // exclude empty elements
)
);
Output:
array (
0 => 'ABCDE1234',
1 => 'ABCD1234',
2 => 'ABCDEF1234',
)