0

I am using the below code to truncate my content before and after the first search keyword in my text (this is for my search page) everything works as it should apart from the code cutting words in half at the beginning of the truncate, it doesn't cut words at the end of the truncate.

Example:

lients at the centre of the relationship and to offer a first class service to them, which includes tax planning, investment management and estate planning. We believe that our customer focused and...

(edit:it is sometimes more than one character missing from the word)

You will see that it has chopped the 'c' off 'clients'. It only happens at the beginning of the text not the end. How can I fix this? I believe I am half way there. code so far:

function neatest_trim($content, $chars, $searchquery,$characters_before,$characters_after) {
            if (strlen($content) > $chars) {
                 $pos = strpos($content, $searchquery);
                 $start = $characters_before < $pos ? $pos - $characters_before : 0;
                $len = $pos + strlen($searchquery) + $characters_after - $start;
                $content = str_replace('&nbsp;', ' ', $content);
                $content = str_replace("\n", '', $content);
                $content = strip_tags(trim($content));
                $content = preg_replace('/\s+?(\S+)?$/', '', mb_substr($content, $start, $len));
                $content = trim($content) . '...';
                $content = strip_tags($content);
                $content = str_ireplace($searchquery, '<span class="highlight" style="background: #E6E6E6;">' . $searchquery . '</span>', $content);
            }
            return $content;
        }



 $results[] = Array(
  'text' => neatest_trim($row->content,200,$searchquery,120,80)
            );
hairynuggets
  • 2,971
  • 20
  • 53
  • 86

2 Answers2

0

The 120 Characters that you are keeping at the start don't check if the 120th character is a space or a letter, and just cuts the string there no matter what.

I would make this change, to search for the closest "space" to the position we are starting from.

$start = $characters_before < $pos ? $pos - $characters_before : 0;
// add this line:
$start = strpos($content, ' ', $start);
$len = $pos + strlen($searchquery) + $characters_after - $start;

This way $start is the position of a space, and not a letter from a word.

Your Function would become:

function neatest_trim($content, $chars, $searchquery,$characters_before,$characters_after) {
    if (strlen($content) > $chars) {
    $pos = strpos($content, $searchquery);
    $start = $characters_before < $pos ? $pos - $characters_before : 0;
    $start = strpos($content, " ", $start);
    $len = $pos + strlen($searchquery) + $characters_after - $start;
    $content = str_replace('&nbsp;', ' ', $content);
    $content = str_replace("\n", '', $content);
    $content = strip_tags(trim($content));
    $content = preg_replace('/\s+?(\S+)?$/', '', mb_substr($content, $start, $len));
    $content = trim($content) . '...';
    $content = strip_tags($content);
    $content = str_ireplace($searchquery, '<span class="highlight" style="background: #E6E6E6;">' . $searchquery . '</span>', $content);
    }
    return $content;
  }
Mark Cameron
  • 2,341
  • 2
  • 22
  • 28
  • I'm still getting more or less the same thing. Any other ideas, pulling my hair out. haha – hairynuggets Oct 31 '11 at 10:33
  • Unless you don't want to truncate anything from the beginning of the content, this should work, I've tested it locally, and it doesn't cut any words at the start... – Mark Cameron Oct 31 '11 at 10:37
  • Works almost apart from some text. For instance 'sing profile Any UK', was 'rising profile' thoughts? – hairynuggets Oct 31 '11 at 10:51
  • odd, I cannot replicate this problem, have to tried copying the full function that I put above and replacing yours with it? None of my tests, even with "rising profile", truncate words... – Mark Cameron Oct 31 '11 at 10:55
  • Removed $content = str_replace("\n", '', $content); and it seems to be working... – hairynuggets Oct 31 '11 at 11:04
0

Why just don't use a replace regex ?

$result = preg_replace('/.*(.{10}\bword\b.{10}).*/s', '$1', $subject);

So this will trim everything 10 chars before and after the keyword 'word'

Explanation :

# .*(.{10}\bword\b.{10}).*
# 
# Options: dot matches newline
# 
# Match any single character «.*»
#    Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
# Match the regular expression below and capture its match into backreference number 1 «(.{10}\bword\b.{10})»
#    Match any single character «.{10}»
#       Exactly 10 times «{10}»
#    Assert position at a word boundary «\b»
#    Match the characters “word” literally «word»
#    Assert position at a word boundary «\b»
#    Match any single character «.{10}»
#       Exactly 10 times «{10}»
# Match any single character «.*»
#    Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»

So what this regex does is finding the word that you specify (and only that word alone because it is included in \b - word boundaries) and it also find ant stores (including the word) the 10 characters before the word as well as the ten characters after it. You could construct the regex yourself with variables for characters before-after and of course the keyword. The regex also matches everything else but the replacement only uses backreference $1 which is what you want as the output.

FailedDev
  • 25,171
  • 9
  • 48
  • 70