1

I have the word AK747, I use regex to detect if a string (at least 2 chars ex: AK) is followed by a number (at least to digits ex: 747). EDIT : (sorry that I wasn't clear on this guys) I need to do this above because :

In some case I need to split to match search against AK-747. When I search for string 'AK-747' with keyword 'AK747' it won't find a match unless I use levenshtein in database, so I prefer splitting AK747 to AK and 747.

My code:

$strNumMatch = preg_match('/^[a-zA-Z]{2,}[0-9]{2,}$/', 
$value, $match);

if(isset($match[0]))
    echo $match[0];

How do I split to array ['AK', '747'] for example with preg_split() or any other way?

KeitelDOG
  • 3,085
  • 4
  • 15
  • 22
  • so it should split on both cases `AK747` and `AK-747`? – RomanPerekhrest Jan 08 '18 at 21:53
  • @jh1711 cheers! Thanks. The group capture was the key. I just accepted an answer similar to your comment, I upvoted your comment. Your name jh1711 is like the example may be that's you knew it so easily. – KeitelDOG Jan 09 '18 at 01:16
  • @mickmackusa, done! I use comments to post quick thoughts that might be a solution, but don't have proper explanation or references to count as a good answer (imo). Everybody is free to use those thoughts, and I did not see any harm. But I will read guidelines again, and rethink my strategy. – jh1711 Jan 09 '18 at 05:50
  • @mickmackusa thanks for the tip, I'll use it from now on. – KeitelDOG Jan 09 '18 at 06:55
  • @jh1711 while rethinking your strategy, please consider some of my points @ https://meta.stackexchange.com/questions/230676/hey-you-yeah-you-post-your-answers-as-answers-not-comments/296481#296481 – mickmackusa Jan 09 '18 at 15:51

4 Answers4

1

You may try this:

preg_match('/[0-9]{2,}/', $value, $matches, PREG_OFFSET_CAPTURE);
$position = $matches[0][1];
$letters = substr($value, 0, $position);
$numbers = substr($value, $position);

This way you get the position of the first number and split there.

EDIT: Starting from your original approach this could look somewhat like this:

$strNumMatch = preg_match('/^([a-zA-Z]{2,})([0-9]{2,})$/', $value, $match, PREG_OFFSET_CAPTURE);
if($strNumMatch){
    $position = $matches[2][1];
    $letters = substr($value, 0, $position);
    $numbers = substr($value, $position);
    $alternative = $letters.'-'.$numbers;
}
marcus.kreusch
  • 549
  • 3
  • 9
  • Many thanks man! That's exactly what I wanted. The key was to use the group capture with parenthesis. Only instead of searching strings with position, I use $strNumMatch = preg_match('/^([a-zA-Z]{2,})([0-9]{2,})$/', $value, $match); and I leave the PREG_OFFSET_CAPTURE option. I can now access values the array as Array ([0] => AK747 [1] => AK [2] => 747). – KeitelDOG Jan 09 '18 at 01:24
  • If you have to use `substr()` to extract substrings AFTER calling a `preg_` function, then you've probably called the wrong function and/or used an inadequate pattern. This is not a direct/professional solution. – mickmackusa Jan 10 '18 at 21:59
1
$input = 'AK-747';

if (preg_match('/^([a-z]{2,})-?([0-9]{2,})$/i', $input, $result)) {
    unset($result[0]);
}

print_r($result);

The output:

Array
(
    [1] => AK
    [2] => 747
)
RomanPerekhrest
  • 73,078
  • 4
  • 37
  • 76
  • Sorry I wasn't clear on that part of my question. I only need AK747 to split to AK and 747, I did not needed for AK-747. But you got I OK with the optional - and the group capture. I already accepted a similar answer before I read and test yours. But you deserve it. – KeitelDOG Jan 09 '18 at 01:33
  • I accepted this answer instead for the sake of this community. It is more simple, clear, direct and even take my misleading "-" into account. So I think other users might find this easier and more appropriate, that's why I switched to that answer. – KeitelDOG Jan 09 '18 at 03:13
1

preg_split() is a very sensible and direct call since you desire an indexed array containing the two substrings.

Code: (Demo)

$input = 'AK-747';
var_export(preg_split('/[a-z]{2,}\K-?/i',$input));

Output:

array (
  0 => 'AK',
  1 => '747',
)

The \K means "restart the fullstring match". Effectively, everything to the left of \K is retained as the first element in the result array and everything to right (the optional hyphen) is omitted because it is considered the delimiter. Pattern Demo


Code: (Demo)

I process a small battery of inputs to show what can be done and explain after the snippet.

$inputs=['AK747','AK-747','AK-','AK'];  // variations as I understand them
foreach($inputs as $input){
    echo "$input returns: ";
    var_export(preg_split('/[a-z]{2,}\K-?/i',$input,2,PREG_SPLIT_NO_EMPTY));
    echo "\n";
}

Output:

AK747 returns: array (
  0 => 'AK',
  1 => '747',
)
AK-747 returns: array (
  0 => 'AK',
  1 => '747',
)
AK- returns: array (
  0 => 'AK',
)
AK returns: array (
  0 => 'AK',
)

preg_split() takes a pattern that receives a pattern that will match a variable substring and use it as a delimiter. If - were present in every input string then explode('-',$input) would be most appropriate. However, - is optional in this task, so the pattern must allow - to be optional (this is what the ? quantifier does in all of the patterns on this page).

Now, you couldn't just use a pattern like /-?/, that would split the string on every character. To overcome this, you need to tell the regex engine the exact expected location for the optional -. You do this by referencing [a-z]{2,} before the -? (single intended delimiter).

The pattern /[a-z]{2,}-?/i does a fair job of finding the correct location for the optional hyphen, but now the trouble is, the leading letters in the string are included as part of the delimiting substring.

Sometimes, "lookarounds" can be used in regex patterns to match but not consume substrings. A "positive lookbehind" is used to match a preceding substring, however "variable length lookbehinds" are not permitted in php (and most other regex flavors). This is what the invalid pattern would look like: /(?<=[a-z]{2,})-?/i.

The way around this technicality is to "restart the fullstring match" using the \K token (aka a lookbehind alternative) just before the optional hyphen. To correctly target only the intended delimiter, the leading letters must be "matched/consumed" then "discarded" -- that's what \K does.

As for the inclusion of the 3rd and 4th parameter of preg_split()...

  • I've set the 3rd parameter to 2. This is just like the limit parameter that explode() has. It instructs the function to not make more than 2 output elements. For this case, I could have used NULL or -1 to mean "unlimited", but I could NOT leave the parameter empty -- it must be assigned to allow for the declaration of the 4th parameter.
  • I've set the 4th parameter to PREG_SPLIT_NO_EMPTY which instructs the function to not generate empty output elements.

Ta-Da!



p.s. a preg_match_all() solution is as easy as using a pipe and two anchors:

$inputs=['AK747','AK-747','AK-','AK'];  // variations as I understand them
foreach($inputs as $input){
    echo "$input returns: ";
    var_export(preg_match_all('/^[a-z]{2,}|\d{2,}$/i',$input,$out)?$out[0]:[]);
    echo "\n";
}
// same outputs as above
mickmackusa
  • 33,121
  • 11
  • 58
  • 86
  • Thanks. I just tested it so basically it separate letters from any other non-letter characters, not just digits. I understand `preg_split()` now. – KeitelDOG Jan 09 '18 at 19:32
  • Also I noticed it will split any pure text from the empty string after the last letter. Like 'AK' will output: `array ( 0 => 'AK', 1 => '', )` . That has the effect of splitting any text. That may be useful, but in some case you may want a split only if the non-letter char is not the last empty string. – KeitelDOG Jan 10 '18 at 15:32
  • @KeitelJovin I've added a new expanded snippet/demo and a whole bunch of explanation to help you understand how it works. Ask me more questions if anything is unclear to you. – mickmackusa Jan 10 '18 at 21:54
  • Many thanks! Specially for the `PREG_SPLIT_NO_EMPTY` flag, and the limit. That prepares my mind for much more flexible approach for future iterations. – KeitelDOG Jan 11 '18 at 16:42
0

You can make the - optional with ?.

/([A-Za-z]{2,}-?[0-9]{2,})/

https://regex101.com/r/tIgM4F/1

Andreas
  • 24,301
  • 5
  • 27
  • 57