8

Consider:

a c p r c 
x s o p c 
v o v n i 
w g f m n 
q a t i t

An alphabet i_index is adjacent to another alphabet j_index in the tile if i_index is next to j_index in any of the following positions:

* * *
* x *
* * *

Here all the * indicates the location which are adjacent to x.

The task is to find a given string in the tile. The condition is that all the characters of the given string should be adjacent, and no one character in the tile may be used more than once to construct the given string.

I have came up with a simply backtracking solution, for which the solutions are pretty fast, but the worst case time is really worse.

For an example: Say the tile has 4x4 filled with all a's , therefore 16 a's, and the string to find is aaaaaaaaaaaaaaab, that is, 15 a's and one b . One what is to eliminate strings with characters which does not appear in the tile. But still worst case can still appear with say the tile have abababababababab and the string to find is abababababababbb .

My attempt is like this:

https://ideone.com/alUPf:

#include <stdio.h>
#include <string.h>
#include <ctype.h>

#define MAX 5

int sp (char mat[MAX][MAX], char *pat, int c, int i, int j)
{
  int r = 0;
  char temp;


  if (c == strlen (pat))
    return 1;
  if (((i<0) || (j<0)) || (i>=MAX) || (j>=MAX))
        return 0;
  if (mat[i][j] != pat[c])
    return 0;
  if (isupper (mat[i][j]))
    return 0;


  /* Save character and mark location to indicate
   * DFS has visited this node, to stop other branches
   * to enter here and cross over path
   */
  temp = mat[i][j];
  mat[i][j] = 0;

  r |= sp (mat, pat, c+1, i-1, j-1);
  r |= sp (mat, pat, c+1, i-1, j);
  r |= sp (mat, pat, c+1, i-1, j+1);
  r |= sp (mat, pat, c+1, i, j+1);
  r |= sp (mat, pat, c+1, i+1, j+1);
  r |= sp (mat, pat, c+1, i+1, j);
  r |= sp (mat, pat, c+1, i+1, j-1);
  r |= sp (mat, pat, c+1, i, j-1);

  /* restore value */
  mat[i][j] = temp;

  /* mark if success */
  if ((mat[i][j] == pat[c]) && (r == 1))
    mat[i][j] = toupper (mat[i][j]);

  return r;
}

/* Testing the `sp` module */
int main (void)
{
  char mat[MAX][MAX] = {
                     {'a', 'c', 'p', 'r', 'c'},
                     {'x', 's', 'o', 'p', 'c'},
                     {'v', 'o', 'v', 'n', 'i'},
                     {'w', 'g', 'f', 'm', 'n'},
                     {'q', 'a', 't', 'i', 't'}
                   };
  char pat[] = "microsoft";
  int i, j;

  for (i=0; i<5; i++)
  {
    for (j=0; j<5; j++)
      printf ("%c ", mat[i][j]);
    printf ("\n");
  }

  for (i=0; i<5; i++)
    for (j=0; j<5; j++)
      sp (mat, pat, 0, i, j);

  printf ("\n\n\n");
  for (i=0; i<5; i++)
  {
    for (j=0; j<5; j++)
    {
      if (isupper (mat[i][j]))
        printf ("%c ", mat[i][j]);
      else
        printf (". ");
    }
    printf ("\n");
  }
  printf ("\n");
  return 0;
}

which prints:

a c p r c 
x s o p c 
v o v n i 
w g f m n 
q a t i t 



. . . R . 
. S O . C 
. O . . I 
. . F M . 
. . T . . 

The function sp does the work, performs the back tracking.

Is there a better way ? or is it possible to lower the worst case time ?

phoxis
  • 52,327
  • 12
  • 74
  • 110

3 Answers3

4

There is no polynomial algorithm, so I don't think you can get much better...

It is possible to encode any grid graph (a planar graph with nodes with degree <= 4) using a letter matrix. The following grid

0-0-0
| | |
0 0-0
| | 
0-0-0

Can be converted by turning edges into 'a's, vertices into 'b's and empty spaces into 'z's

B a B a B  
a z a z a  
B z B a B  
a z a z z  
B a B a B 

Looking for a hamiltonian path in the original graph is equivalent to searching for the string BaBaBaBaBaBaBaBaB (with all 9 Bs). But the Hamiltonian path problem for grids in NP-complete* so the word searching problem is NP-hard.

Since a word path is clearly a polynomial certificate, the word searching problem on matrices is NP-complete.


*I remember seeing a proof for this a while ago and Wikipedia confirms, but without linking to a reference >:/


I'm pretty sure theres more on this problem out there. I just pulled this proof out of my ass and I'm pretty sure were not the first ones to look at the problem. At the least there is a good chance for nice heuristics on the non-degenerate cases you get in a real magazine puzzle...

hugomg
  • 63,082
  • 19
  • 144
  • 230
  • Didn't notice that Mr. Hamilton was hiding in the puzzle. This is going to help me to get more into it. Thanks for the graph model. Yep definitely we are not the first one, lets see. – phoxis Oct 01 '11 at 12:55
1

One simple improvement is to check the value of r after each call to sp. If r == 1 then stop calling sp. You found your solution. This is unless you need to find all possible solutions.

Something like this (logical OR operator does not calculate second operand if first is true):

r = sp (mat, pat, c+1, i-1, j-1)) ||
  sp (mat, pat, c+1, i-1, j) ||
  sp (mat, pat, c+1, i-1, j+1) ||
  sp (mat, pat, c+1, i, j+1) ||
  sp (mat, pat, c+1, i+1, j+1) ||
  sp (mat, pat, c+1, i+1, j) ||
  sp (mat, pat, c+1, i+1, j-1) ||
  sp (mat, pat, c+1, i, j-1) ? 1 : 0;
Dialecticus
  • 15,040
  • 5
  • 36
  • 88
  • yes, such simple coding tweaks could be done, for example, if the length of the string is greater than nxn then stop the process at once, etc, but this will not reduce the worst case time. I want to investigate on an algorithmic improvement. – phoxis Oct 01 '11 at 12:57
  • @phoxis, please make an experiment by replacing the code with one given here. I'm curious about the result. – Dialecticus Oct 02 '11 at 12:47
  • basically the same, but it will simply won't call the next functions if one of them returns true. I have thought about such possibility, but in this case not all the occurrences would be found, as after one match, it won't call the next ones. Definitely, if the first call returns true, this will stop searching a lot of the space. thanks for the update, it will definitely make improvement if only one match is to be found, but won't improve the worst case. – phoxis Oct 02 '11 at 17:51
0

I think you might find that focusing on the worst case is counterproductive here, because there is no real improvement to be made. However, there are many useful improvements to be made in "real world" cases. For example, if the words are always drawn from a dictionary, if they may be limited in length (or have a natural distribution of lengths, statistically speaking). For small grids you could imagine searching them in advance to find all words from a dictionary, storing the list in a hashmap, and then performing a simple lookup as words need to be "tested". All the time goes into building your index, but that may be acceptable if the grid rarely changes.

Mike Sokolov
  • 6,565
  • 2
  • 19
  • 30