1

I've written these functions (which work) to find the longest common subsequence of two strings.

def lcs_grid(xs, ys):
    grid = defaultdict(lambda: defaultdict(lambda: (0,"")))
    for i,x in enumerate(xs):
        for j,y in enumerate(ys):
            if x == y:
                grid[i][j] = (grid[i-1][j-1][0]+1,'\\')
            else:
                if grid[i-1][j][0] > grid[i][j-1][0]:
                    grid[i][j] = (grid[i-1][j][0],'<')
                else:
                    grid[i][j] = (grid[i][j-1][0],'^')

    return grid

def lcs(xs,ys):
    grid = lcs_grid(xs,ys)
    i, j = len(xs) - 1, len(ys) - 1

    best = []
    length,move = grid[i][j]
    while length:
        if move == '\\':
            best.append(xs[i])
            i -= 1
            j -= 1
        elif move == '^':
            j -= 1
        elif move == '<':
            i -= 1
        length,move = grid[i][j]

    best.reverse()
    return best

Has anybody a proposition to modify the functions s.t. they can print the longest common subsequence of three strings? I.e. the function call would be: lcs(str1, str2, str3)

Till now, I managed it with the 'reduce'-statement, but I'd like to have a function that really prints out the subsequence without the 'reduce'-statement.

Bill the Lizard
  • 369,957
  • 201
  • 546
  • 842
MarkF6
  • 493
  • 3
  • 19

1 Answers1

6

To find the longest common substring of D strings, you cannot simply use reduce, since the longest common substring of 3 strings does not have to be a substring of the LCS of any of the two. Counterexample:

a = "aaabb"
b = "aaajbb"
c = "cccbb"

In the example, LCS(a,b) = "aaa" and LCS(a, b, c) = "bb". As you can see, "bb" is not a substring of "aaa".

In your case, since you implemented the dynamic programming version, you have to build a D-dimensional grid and adjust the algorithm accordingly.

You might want to look at suffix trees, which should make things faster, see Wikipedia. Also look at this stackoverflow question

Community
  • 1
  • 1
mensi
  • 8,850
  • 1
  • 29
  • 42
  • 1
    Thank you very much. Question to the stackoverflow-link: There, a substring is returned. But as you noticed, I'd like to return the subsequence. (The difference is: example: "Monday", "Today". --> SUbsequence: "o", "d", "a", "y" ; substring: "day".) So, what should be different in the link's code st. I receive the subsequence (instead of the substring)? – MarkF6 May 25 '12 at 06:07
  • 1
    @mensi: LCS(a,b) would not be "aaa" but "aaabb". – Matthias May 25 '12 at 06:14
  • @Matthias: true :) I didn't notice. But LCS(a,b,c) is correct ;) – MarkF6 May 25 '12 at 06:15
  • 1
    PS: The wikipedia-link is about the substring problem. But what I mean is really the subsequence problem. – MarkF6 May 25 '12 at 06:19
  • Hadn't thought this through enough when I posted my comment. +1. – Fred Foo May 25 '12 at 08:18
  • @Matthias In my defense, it was late here when I wrote that ;) – mensi May 25 '12 at 08:18
  • [Link](http://en.wikipedia.org/wiki/Longest_common_subsequence_problem) to LCSubsequence on Wikipedia. – Rock Jun 03 '12 at 15:35