3

For sure this has already been asked by someone else, however I've searched here on SO and found nothing https://stackoverflow.com/search?q=php+parse+between+words

I have a string and want to get an array with all the words contained between 2 delimiters (2 words). I am not confident with regex so I ended up with this solution, but it is not appropiate because I need to get all the words that match those requirements and not only the first one.

$start_limiter = 'First';
$end_limiter = 'Second';
$haystack = $string;

# Step 1. Find the start limiter's position

$start_pos = strpos($haystack,$start_limiter);
if ($start_pos === FALSE)
{
    die("Starting limiter ".$start_limiter." not found in ".$haystack);
}

# Step 2. Find the ending limiters position, relative to the start position

$end_pos = strpos($haystack,$end_limiter,$start_pos);

if ($end_pos === FALSE)
{
    die("Ending limiter ".$end_limiter." not found in ".$haystack);
}

# Step 3. Extract the string between the starting position and ending position
# Our starting is the position of the start limiter. To find the string we must take
# the ending position of our end limiter and subtract that from the start limiter
$needle = substr($haystack, $start_pos+1, ($end_pos-1)-$start_pos);

echo "Found $needle";

I thought also about using explode() but I think a regex could be better and faster.

Community
  • 1
  • 1
Giorgio
  • 1,505
  • 1
  • 26
  • 52
  • Just came back here since this question has passed the 1000 views mark. Just wondering why it got a negative vote: it showed my effort in searching for similar questions and I provided my code. – Giorgio Nov 02 '14 at 12:48

5 Answers5

8

I'm not much familiar with PHP, but it seems to me that you can use something like:

if (preg_match("/(?<=First).*?(?=Second)/s", $haystack, $result))
    print_r($result[0]);

(?<=First) looks behind for First but doesn't consume it,

.*? Captures everything in between First and Second,

(?=Second) looks ahead for Second but doesn't consume it,

The s at the end is to make the dot . match newlines if any.


To get all the text between those delimiters, you use preg_match_all and you can use a loop to get each element:

if (preg_match_all("/(?<=First)(.*?)(?=Second)/s", $haystack, $result))
    for ($i = 1; count($result) > $i; $i++) {
        print_r($result[$i]);
    }
Jerry
  • 67,172
  • 12
  • 92
  • 128
  • 1
    Much cleaner regex than mine! `preg_match` has no return value other than TRUE or FALSE. The third parameter of the function is the output as an array. – phpisuber01 Aug 12 '13 at 18:37
  • @phpisuber01 Oh, okay, thanks for the info. :) I'll edit it then. – Jerry Aug 12 '13 at 18:38
  • Thanks for your code but it doens't work as expected. If I have more than one starting word and ending word it parses only the first word. Let's suppose $start = "A"; $end = "B"; if I have $subject = "A Hello B . A How B - A are B , A youB"; It will return only "Hello", instead I'd like to have $match = array ( "Hello" "How" "Are" "You"); – Giorgio Aug 12 '13 at 21:05
  • @Giorgio Well, if you want to find all those you have to use `preg_match_all`. The method you used in your question pointed to the fact that there was only one `starting_pos` and one `end_pos`, thus why I used `preg_match`... After that, you can run a loop to check each element of the array. I edited the code. – Jerry Aug 13 '13 at 06:02
3

Not sure that the result will be faster than your code, but you can do it like this with regex:

$pattern = '~(?<=' . preg_quote($start, '~') 
         . ').+?(?=' . preg_quote($end, '~') . ')~si';
if (preg_match($pattern, $subject, $match))
    print_r($match[0]);

I use preg_quote to escape all characters that have a special meaning in a regex (like +*|()[]{}.? and the pattern delimiter ~)

(?<=..) is a lookbehind assertion that check a substring before what you want to find.
(?=..) is a lookahead assertion (same thing for after)
.+? means all characters one or more times but the less possible (the question mark make the quantifier lazy)

s allows the dot to match newlines (not the default behavior)
i make the search case insensitive (you can remove it, if you don't need)

Casimir et Hippolyte
  • 83,228
  • 5
  • 85
  • 113
  • Thanks for your code but it doens't work as expected. If I have more than one starting word and ending word it parses only the first word. Let's suppose $start = "A"; $end = "B"; if I have $subject = "A Hello B . A How B - A are B , A youB"; It will return only "Hello", instead I'd like to have $match = array ( "Hello" "How" "Are" "You"); – Giorgio Aug 12 '13 at 21:04
  • @Giorgio: if you want all results, replace `preg_match` by `preg_match_all`. – Casimir et Hippolyte Aug 13 '13 at 00:06
3

This allows you to run the same function with different parameters, just so you don't have to rewrite this bit of code all of the time. Also uses the strpos which you used. Has been working great for me.

function get_string_between($string, $start, $end){
    $string = " ".$string;
    $ini = strpos($string,$start);
    if ($ini == 0) return "";
    $ini += strlen($start);
    $len = strpos($string,$end,$ini) - $ini;
    return substr($string,$ini,$len);
}

$fullstring = 'This is a long set of words that I am going to use.';

$parsed = get_string_between($fullstring, 'This', "use");

echo $parsed;

Will output:

is a long set of words that I am going to
KevBot
  • 14,556
  • 3
  • 37
  • 59
2

Here's a simple example for finding everything between the words 'mega' and 'yo' for the string $t.

PHP Example

$t = "I am super mega awesome-sauce, yo!";

$arr = [];
preg_match("/mega\ (.*?)\ yo/ims", $t, $arr);

echo $arr[1];

PHP Output

awesome-sauce,
phpisuber01
  • 7,006
  • 3
  • 20
  • 26
  • Thanks for your code but it doens't work as expected. If I have more than one starting word and ending word it parses only the first word. Let's suppose $start = "A"; $end = "B"; if I have $subject = "A Hello B . A How B - A are B , A youB"; It will return only "Hello", instead I'd like to have $match = array ( "Hello" "How" "Are" "You"); – Giorgio Aug 12 '13 at 21:08
0

You can also use two explode statements.

For example, say you want to get "z" in y=mx^z+b. To get z:

$formula="y=mx^z+b";
$z=explode("+",explode("^",$formula)[1])[0];

First I get everything after ^: explode("^",$formula)[1]

Then I get everything before +: explode("+",$previousExplode)[0]

Josh Powlison
  • 553
  • 7
  • 11
  • Sorry, just read that you already thought about using explode (I know this is an old thread). Maybe this will help somebody else. – Josh Powlison Jan 14 '16 at 14:28