5

* Updated question from revo's answer

Here is the working script with a better set of example strings to show my intent-

$strings[] = 'seventy five yards out';
$strings[] = 'sixty yards out';
$strings[] = 'one hundred fifty yards out';

$inputString = 'seventy two yards out';
$inputWords = str_word_count($inputString, 1);

$foundWords = [];

foreach ($strings as $key => $string) {
    $stringWords = str_word_count($string, 1);
    $wordsCount = array_count_values($stringWords);
    $commonWords = array_intersect($inputWords, array_keys($wordsCount));
    if (count($commonWords) > 0) {
        foreach ($commonWords as $commonWord) {
            $foundWords[$key][$commonWord] = $wordsCount[$commonWord];
        }
    }
}

print_r($foundWords);

How would I get it to print 'seventy five yards out' as it would be the actual closest to the text? I was thinking of dividing the word count to get a percentage but now think that might now work..

Community
  • 1
  • 1
Ryan D
  • 677
  • 6
  • 24

3 Answers3

2

Something like this should work:

<?php

$g = 'the weather is nice'; // strings to loop through
$n = 'the water is blue';
$b = 'that was a bad movie';

$t = 'hows the weather';  // example input
$test = (str_word_count($t, 1)); // breaks out each word into array

// Comparisons
$comps = array();
// Array sums
$sums = array();
// Search each variable that's been set, as long as it's less that 't'
// A "for" loop will accept letters in addition to numbers, so we'll start with the
// letter "a" and loop through each letter up to "s" (which is one less than "t")
for ($inc = 'a'; $inc < 't'; $inc++) {
  // Now, a variable assigned as $$inc will translate into $a, $b, $c ... $s
  // and if $a, $b, $c, etc, are set...
  if (isset($$inc)) {
    // ... assign them to the $comps array with a key of $$inc
    $comps[$$inc] = str_word_count($$inc, 1);

    // For example, when the "for" loop reaches "f", nothing will be added to the
    // $comps array because $f is not set above.

    // But when it gets to "g" it'll find that $g HAS been set, and that it has a
    // value of "the weather is nice". At this point the $comps array will now look
    // like this:
    // $comps['the weather is nice'] = array('the', 'weather', 'is', 'nice');

    // If you'd like to see this in action (since it might sound a little confusing),
    // remove the # from the beginning of each of the following lines that start with #
    // (there should be 10 total):

    #print "<pre>The loop has reached the letter <b>{$inc}</b> for the value of ";
    #print "<b>\$inc</b> and has found that <b>\${$inc}</b> HAS been set in the code.\n";
    #print "Adding another dollar sign to <b>\$inc</b> has had the following effects:\n";
    #print "- <b>\$inc</b> now looks like <b>\$\$inc</b> (from within the written part of the code)\n";
    #print "- <b>\$\$inc</b> translates into <b>\${$inc}</b> (the variable that is acually being evaluated)\n";
    #print "- <b>\${$inc}</b> evaluates to <b>{$$inc}</b>\n</pre>";
  }
  #else {
  #  print "<pre>The loop has reached the letter <b>{$inc}</b> for the value of <b>\$inc</b>";
  #  print " and has found that <b>\${$inc}</b> has NOT been set in the code, so it's being skipped.\n";
  #}
}
// Avoid errors by checking if empty or not
if (!empty($comps)) {
  foreach ($comps as $key => $comp) {
    // Find intersections, if any
    $candidates[$key] = array_intersect($test, $comp);
    // Count the intersections
    $counts[$key] = array_count_values($candidates[$key]);
    // Add up the intersections
    $sums[$key] = array_sum($counts[$key]);
  }
}
$winner = '';
if (!empty($sums)) {
  // Reverse sort $sums, putting the highest value first
  arsort($sums);
  // Flip $sums so we can extract the key
  $flipped = array_flip($sums);
  // Extract the first key off of $sums
  $winner = array_shift($flipped);
}

print $winner;
jerdiggity
  • 3,515
  • 1
  • 22
  • 38
  • Yes this works great but you kinda lost me on how it works, where is it looping through $g, $b and $n? Sorry im new to this.. Thanks! @jerdiggity – Ryan D Jul 23 '16 at 19:15
  • 1
    @RyanD it searches through $g, $b & $n on the for loop. `for ($inc = 'a'; $inc < 't'; $inc++) { if (isset($$inc)) { $comps[$$inc] = str_word_count($$inc, 1); } }`. It's called Variable Variable http://stackoverflow.com/questions/2715654/what-does-dollar-dollar-or-double-dollar-mean-in-php – MikeF Jul 23 '16 at 20:34
  • 1
    @RyanD I updated my answer with a little more explanation... Hopefully it clears things up. :) – jerdiggity Jul 23 '16 at 20:46
  • Wow great explanation! Thank you @jerdiggity – Ryan D Jul 23 '16 at 20:48
2

The key is to do a str_word_count() on each provided string separately. This way we are transforming into arrays and dealing with arrays are much simpler for what you desire.

array_count_values() counts values of an array which leads to having number of word occurrences.

$strings[] = 'seventy five yards out';
$strings[] = 'sixty yards out';
$strings[] = 'one hundred fifty yards out';

$inputString = 'seventy two yards out';
$inputWords = str_word_count($inputString, 1);

$probabilities = [];

foreach ($strings as $key => $string) {
    $stringWords = str_word_count($string, 1);
    $wordsCount = array_count_values($stringWords);
    $commonWords = array_intersect($inputWords, array_keys($wordsCount));
    if (count($commonWords) > 0) {
        foreach ($commonWords as $commonWord) {
            if (!isset($probabilities[$key])) $probabilities[$key] = 0;
            $probabilities[$key] += $wordsCount[$commonWord];
        }
        $probabilities[$key] /= count($stringWords);
    }
}
arsort($probabilities);
echo $strings[key($probabilities)];

Output:

seventy five yards out

Probabilities print_r($probabilities);:

Array
(
    [0] => 0.75
    [1] => 0.66666666666667
    [2] => 0.4
)

Live demo

revo
  • 43,830
  • 14
  • 67
  • 109
  • 1
    You are welcome. Also as you accepted jerdiggity's answer you should ask him to do a modification since number of repeated words within strings are not getting into account in his code. @RyanD – revo Jul 23 '16 at 19:41
  • Ya thats kinda what I replied to him I didnt see where they were getting taken into account, thought I was just missing something.. – Ryan D Jul 23 '16 at 19:43
  • How would I take this further to print the winning sentence? I am not good with arrays – Ryan D Jul 23 '16 at 19:54
  • What are your factors for a *winning sentence*? @RyanD – revo Jul 23 '16 at 20:00
  • the most matched words based on the amount of words provided.. So if there are 3 matches for string one and 3 matches for string 2 it would count the amount of words in each string divided by the amount of matches to give the highest probability of a matching string..If that makes any sense.. i do not have specific application for this yet just trying to learn it.. Thank you! @revo – Ryan D Jul 23 '16 at 20:03
  • Bam! way to go @revo thats perfect! except I cannot print the probabilities, just says array.. – Ryan D Jul 23 '16 at 20:29
  • 1
    I updated again based on your last update. Also it is better to revert your edit to the original question which was a real question. @RyanD – revo Jul 23 '16 at 20:34
0

At the first, your question was asking for number of occurrences as well. But as you clearly went further I felt I should bid for another solution.

similar_text() function!

$strings[] = 'sixty yards out';
$strings[] = 'seventy five yards out';
$strings[] = 'one hundred fifty yards out';

$inputString = 'seventy two yards out';

$p = 0;
$k = null;
foreach ($strings as $key => $string) {
    similar_text($inputString, $string, $percent);
    if ($percent > $p) {
        $p = $percent;
        $k = $key;
    }
}

echo !is_null($k) ? $strings[$k] : "";

Output:

seventy five yards out

Live demo

revo
  • 43,830
  • 14
  • 67
  • 109