2

Let's say I have a string like this:

$urlsString = "http://foo.com/barhttps://bar.com//foo.com/foo/bar"

and I want to get an array like this:

array(
    [0] => "http://foo.com/bar",
    [1] => "https://bar.com",
    [0] => "//foo.com/foo/bar"
);

I'm looking to something like:

preg_split("~((https?:)?//)~", $urlsString, PREG_SPLIT_NO_EMPTY|PREG_SPLIT_DELIM_CAPTURE);

Where PREG_SPLIT_DELIM_CAPTURE definition is:

If this flag is set, parenthesized expression in the delimiter pattern will be captured and returned as well.

That said, the above preg_split returns:

array (size=3)
  0 => string '' (length=0)
  1 => string 'foo.com/bar' (length=11)
  2 => string 'bar.com//foo.com/foo/bar' (length=24)

Any idea of what I'm doing wrong or any other idea?

PS: I was using this regex until I've realized that it doesn't cover this case.

Edit:

As @sidyll pointed, I'm missing the $limit in the preg_split parameters. Anyway, there is something wrong with my regex, so I will use @WiktorStribiżew suggestion.

Manolo
  • 16,729
  • 16
  • 67
  • 115
  • 1
    `preg_split` takes four arguments, the third is the limit. You're passing the flags as limit. Flags are the fourth argument. Still, this won't produce what you're expecting. The `DELIM_CAPTURE` puts delimiters as elements by themselves, and your regex causes some ambiguity with delimiters (matches http:// as http:// and // (two separated delimiters)) – sidyll Apr 19 '17 at 12:31
  • Oh! you're right. Thank you – Manolo Apr 19 '17 at 12:33

1 Answers1

3

You may use a preg_match_all with the following regex:

'~(?:https?:)?//.*?(?=$|(?:https?:)?//)~'

See the regex demo.

Details:

  • (?:https?:)? - https: or http:, optional (1 or 0 times)
  • // - double /
  • .*? - any 0+ chars other than line break as few as possible up to the first
  • (?=$|(?:https?:)?//) - either of the two:
    • $ - end of string
    • (?:https?:)?// - https: or http:, optional (1 or 0 times), followed with a double /

Below is a PHP demo:

$urlsString = "http://foo.com/barhttps://bar.com//foo.com/foo/bar";
preg_match_all('~(?:https?:)?//.*?(?=$|(?:https?:)?//)~', $urlsString, $urls);
print_r($urls);
// => Array ( [0] => http://foo.com/bar [1] => https://bar.com [2] => //foo.com/foo/bar )
Wiktor Stribiżew
  • 484,719
  • 26
  • 302
  • 397