Suppose there is a finite set $A$ containing $n$ elements. One can construct a sequence of finite length:

$$\{a_i\},\ a_i \in A,\ i\in \mathbb N,\ i\le N$$

This sequence contains $2^N$ subsequences of the form:

$$\{ a_{i_k}\}, k\in \mathbb N,\ k\le M\le N,\ i_k>i_{k-1}$$

Such a subsequence is palindromic iff $a_{i_{M+1-k}} = a_{i_k}\ \forall k$. The length of the longest such palindromic subsequence can thus be defined as:

$$M_p\{a_i\} = \max\{|\{a_{i_k}\}|:a_{i_{M+1-k}} = a_{i_k}\ \forall k\}$$

The algorithm to find $M_p$ is a well known dynamic programming problem, but after implementing it I found an interesting behavior. The average proportional length of the longest palindromic subsequence can be written as:

$$\pi(N) = \left\langle\frac{M_p\{a_i\}}{N}\right\rangle$$

where $\left\langle\right\rangle$ denotes an average over all $\{a_i\}$ of length $N$. In principle, a random sampling should provide a good approximation. Numerically, it appears that $\lim_{N\to\infty}\pi(N)$ converges to a constant depending only on the $n$, the number of elements from which the sequence is constructed. For binary sequences, this limit appears to be $\approx 0.8$, while for $n=10$, it appears to be $\approx 0.45$.

The fact that the limit converges to a constant other than $1$ or $0$ indicates that the longest palindrome scales linearly with the length of the sequence, and that the calculated values appear to be rational numbers quite curious.

Here are some rough approximations of $\lim_{N\to\infty}\pi_n(N)$, generated using 256 random sequences of length 256. $\sigma$ indicates the standard deviation of the distribution, not the sample mean.

\begin{array}{c|lcr} n & \pi_n(N) & \sigma \\ \hline 2 & 0.80035 & 0.01647 \\ 3 & 0.70724 & 0.01615 \\ 4 & 0.64514 & 0.01544 \\ 5 & 0.59718 & 0.01697 \\ 6 & 0.56046 & 0.01617 \\ 7 & 0.53024 & 0.01627 \\ 8 & 0.50377 & 0.01527 \\ 9 & 0.48380 & 0.01520 \\ 10 & 0.46349 & 0.01489 \end{array}

I would like to know if there is a way to derive the asymptotic behavior observed in these numerical results. The complexity of the algorithm for $M_p$ makes it difficult to derive a probability distribution, and the rational number results hint that there may be a simpler approach. Any input would be appreciated.

  • 11,937
  • 4
  • 27
  • 78
  • 13,139
  • 1
  • 12
  • 31
  • 1
    How certain is the observation that these results are rational? Can you be more specific? Does it work for other sizes of alphabet than 2 and 10? I can make some estimates of the mean relative length, so I can show that it is not 0 neither 1. But if there is some direct formula, there is perhaps a nicer approach. – Mirek Olšák Mar 09 '17 at 16:21
  • 1
    I am not certain they are rational, though they appear to be. The convergence is rather slow, and requires sequences of several hundred elements to obtain a good approximation. Once I've generated a bit more data, I'll include more quantitative evidence of my observations. – Kajelad Mar 09 '17 at 18:48
  • 1
    Just a trivial observation: if $n = 1$ then $\pi(N) = 1$. – Smylic Mar 09 '17 at 21:00
  • 1
    Have you computed it exactly for the binary case for small $N$? – graffe Mar 10 '17 at 10:57
  • 1
    Only the cases so small that they can be computed via enumeration. $\pi_2(1)=1$, $\pi_2(2)=\frac 34$, $\pi_2(3)=\frac 56$, ect. – Kajelad Mar 10 '17 at 20:16

1 Answers1


For two words $W_1$ and $W_2$ sharing a common alphabet, let $L(W_1, W_2)$ be the longest common subword of $W_1$ and $W_2$ (i.e. the longest word appearing as a subword of both $W_1$ and $W_2$). Define $$\gamma_n = \lim_{N \rightarrow \infty} \frac{E[L(W_1, W_2)]}{N},$$ where $W_1$ and $W_2$ are two independent words of length $N$ uniformly chosen from an alphabet of length $n$. This limit exists because the longest common word is superadditive when we concatenate words: We have $$L(W_1 X_1, W_2 X_2) \geq L(W_1, W_2) + L(X_1, X_2).$$

Determining $\gamma_n$ is a longstanding open problem, and it seems likely that $\gamma_n$ is also the limiting fraction of the longest palindrome.

Why $\gamma_n$ is a lower bound for $\pi(N)$: Divide your sequence into two equal halves. You can construct a palindrome by taking a common word between the first half and the reversal of the second half.

Why $\gamma_n$ is an upper bound for $\pi(N)$ (NOT RIGOROUS). Let $L_k$ be the length of the longest common sequence between the first $k$ characters and the reversal of the last $N-k$ characters of your word. The longest palindrome can be thought of as the maximum of $2L_k$ over all $k$ between $1$ and $n$ ($k$ corresponds to the midpoint of your word).

By definition, the expected value of $L_{N/2}$ is $\frac{N}{2} \gamma_n$. Furthermore, we have some amount of concentration: Because changing a single character in either sequence can only change the longest common subsequence by $1$, it follows from McDiarmid's Inequality (which in turn comes from Azuma's martingale inequality) that $L_n$ is almost always within $O(\sqrt{N})$ of its expectation. In particular, it is very unlikely (probability much smaller than $1/N$) for $2L_{N/2}$ to be at least $cN$ for any $c>\gamma_n$.

Intuitively, it should be even less likely for $L_k$ to be unusually large when the sequence is divided unevenly. If this were true, then what we wanted would follow from the union bound. I don't see an obvious way to show it though.

Wikipedia has quite a few references on the problem of determining $\gamma_k$ in its article on the Chvátal–Sankoff constants

Kevin P. Costello
  • 6,196
  • 21
  • 45
  • 1
    Very interesting. I'll have to do some background reading to make sense of your upper bound proof, but it seems pretty convincing. I halfway suspected my question was related to a better known combinatorics problem, but I was previously unable to make such a connection. Thanks. – Kajelad Mar 11 '17 at 09:56