-2

Okay - this has boggled me for days. I've tried regex with negative lookahead, but to no avail.

Basically, in PHP, I need to parse conversation threads and extract the LAST occurrence of http links that can occur by itself, or in a consecutive group of 2 or more. So, in example 1, it should return the last link, but in example 2, it should return the last 3 links.

I don’t need to achieve this with a single regex, but I’m not sure what other approaches to try. Any help would be appreciated!!

EXAMPLE 1:

Lorem ipsum dolor sit amet, consectetur adipiscing elit.

http://sample.com/12345.png

In pharetra elementum dui vel pretium. Quisque rutrum mauris vitae turpis hendrerit facilisis. Sed ultrices imperdiet ornare.

http://sample.com/13578.png


EXAMPLE 2:

Lorem ipsum dolor sit amet, consectetur adipiscing elit.

http://sample.com/12345.png

In pharetra elementum dui vel pretium. Quisque rutrum mauris vitae turpis hendrerit facilisis. Sed ultrices imperdiet ornare.

http://sample.com/24689.png
http://sample.com/13578.png
http://sample.com/98761.png


41686d6564
  • 15,043
  • 11
  • 32
  • 63
Chungster
  • 3
  • 1

1 Answers1

0

1) Split your Text on the delimiter \s.

$resultArray = preg_split("@\s@", $conversation)

on example:

$conversation = "Hallo, http://1.de text http://2.de\r\nhttp://3.de Hello";

(This will produce something like this as intermediate result:)

Array
(
    [0] => Hallo,
    [1] => http://1.de
    [2] => text
    [3] => http://2.de
    [4] => 
    [5] => http://3.de
    [6] => Hello
)

2.) Finally, reverse iterate over the result array. Start "matching", if the result starts with "http://" - stop matching if you encounter anything else, Ignore Empty lines as well as lines with whitespaces only.:

$conversation = "Hallo, http://1.de text http://2.de\r\nhttp://3.de Hello";
$resultArray = preg_split("@\s@", $conversation);
$result = array();

$matching = false;
for ($i = count($resultArray)-1; $i >= 0; $i--){
    if (preg_match("@http:\/\/@", $resultArray[$i])){
      $matching=true;
      $result[] = $resultArray[$i];  
    } else if (preg_match("@^\s*$@", $resultArray[$i])){
       //ignore this bad boy
    }else{
        if ($matching){
            break;
        }
    }
}

echo "<pre>";
print_r(array_reverse($result));
echo "</pre>";

yields:

Array
(
    [0] => http://2.de
    [1] => http://3.de
)
dognose
  • 18,985
  • 9
  • 54
  • 99