Let $W_n$ be the set of all words of length n, on alephbet {a,b,c}. Let $L$ be the maximal length of consecutive $a$ letters in a word.

A. Find the generating function of the number of words in $W_n$ such as $L<3$.

B. Find the generating function of the number of words in $W_n$ such as $L<k$.

C. Find the expectation of $L$, i.e $W(q,x)=\sum_n \sum_\pi x^n q^{L(\pi)}$. Then find its asymptotic.

My solution:

A. Every word in $W_n$= "empty word" or "a ( )" or "b(every word)" or "c(every word)". Let $x$ count a lenght of a single number.

So, $W(x)=1+A_1(x) + 2xW(x)$

"a()" ="a" or "aa( )" or "ab(every word)" or "ac(every word)".

So, $A_1(x)=x+A_2(x)+2x^2W(x)$

"aa( )= "aa" or "aab(every word)" or "aac(every word)".

So, $A_2(x)=x^2+2x^3W(x)$.

Affer substituting we get that the gf is:


B. As we did in part A: Now it is the generalized problem given in A.




$A_3(x)=x^3+A_4(x)+2x^4W(x)$ . .


After solving this, we get:


C.Let $x$ count the length of a single letter, $q$ counts a letter $a$. We can do the same approach as part B, and see that:


Then, we need to find the coefficients of $W(1,x)=W(x)$ And, $[d/dq W_k(q,x)]_{q=1}$ Then, their proportion will be the expectation. Another way, is to look at $W^{L+1}-W_L$ then $E=\sum_L L*(W_{L+1}-W_L$). But both ways are complicated for me, with these not simple generatinh functions.

How can the coefficients in the two generating functions be calculated (and analyzed asymptotically)? Or in other words how can the expectation be computed and analyzed asymptoticaly?

  • 770
  • 3
  • 13

1 Answers1


The following answer is based upon the Goulden-Jackson Cluster Method which provides a convenient technique to solve problems of this kind. We consider the set of words of length $n\geq 0$ built from an alphabet $\mathcal{V}=\{a,b,c\}$ and the set $B=\{a^k\}$ containing a bad word of length $k$, which is not allowed to be part of the words we are looking for.

Generating function $W_k(x)$: We derive a generating function $W_k(x)$ with the coefficient of $x^n$ being the number of valid words of length $n$.

According to the paper (p.7) the generating function $W_k(x)$ is \begin{align*} W_k(x)=\frac{1}{1-dx-\text{w}(\mathcal{C})}\tag{1} \end{align*} with $d=|\mathcal{V}|$ the size of the alphabet and $\mathcal{C}$ the weight-numerator of bad words with \begin{align*} \text{w}(\mathcal{C})=\text{w}(\mathcal{C}[a^k])\tag{2} \end{align*}

We calculate according to the paper: \begin{align*} \text{w}(\mathcal{C}[a^k])&=-x^{k}-x\text{w}(\mathcal{C}[a^k])-x^2\text{w}(\mathcal{C}[a^k])-\cdots-x^{k-1}\text{w}(\mathcal{C}[a^k])\tag{3}\\ \end{align*} The term $x^k$ at the right-hand side represents a $k$-run of $a$. Since there might occur more than one bad words, i.e. more than one $k$-runs of $a$ we have to take care of overlappings. This is done in the expressions $x^j\text{w}(\mathcal{C}[a^k])$ which account for overlappings of $k-j$ $a$'s of two $k$-runs of $a$. Some more details of this technique is given in this answer.

We obtain from (3): \begin{align*} \text{w}(\mathcal{C}[a^k])\left(1+x+\cdots+x^{k-1}\right)&=-x^k\\ \text{w}(\mathcal{C}[a^k])&=-\frac{x^k(1-x)}{1-x^k} \end{align*} It follows from (1) - (3): \begin{align*} \color{blue}{W_k(x)}&=\frac{1}{1-dx-\text{weight}(\mathcal{C})}\\ &=\frac{1}{1-3x+\frac{x^k(1-x)}{1-x^k}}\\ &\,\,\color{blue}{=\frac{1-x^k}{1-3x+2x^{k+1}}}\tag{4} \end{align*} and conclude the number of valid words of length $n\geq 0$ is the coefficient of $x^n$ in (4) in accordance with OPs results (a) and (b).

In the following we use the coefficient of operator $[x^n]$ to denote the coefficient of a series.

Coefficient extraction: We obtain from (4) for $n\geq 1$: \begin{align*} \color{blue}{[x^n]}\color{blue}{W_k(x)}&=[x^n]\frac{1-x^k}{1-x\left(3-2x^k\right)}\\ &=[x^n]\sum_{j=0}^{\infty}x^j\left(3-2x^k\right)^j\left(1-x^k\right)\tag{4.1}\\ &=\sum_{j=0}^{n}[x^{n-j}]\left(3-2x^k\right)^j\left(1-x^k\right)\tag{4.2}\\ &=\sum_{j=0}^{n}[x^{j}]\left(3-2x^k\right)^{n-j}\left(1-x^k\right)\tag{4.3}\\ &=\sum_{j=0}^{\left\lfloor\frac{n}{k}\right\rfloor} [x^{kj}]\sum_{l=0}^{n-kj}\binom{n-kj}{l}(-2)^l3^{n-kj-l}x^{kl}\\ &\qquad-\sum_{j=0}^{\left\lfloor\frac{n}{k}\right\rfloor} [x^{kj}]\sum_{l=0}^{n-kj}\binom{n-kj}{l}(-2)^l3^{n-kj-l}x^{k(l+1)}\tag{4.4}\\ &\color{blue}{=\sum_{j=0}^{\left\lfloor\frac{n}{k}\right\rfloor}\binom{n-kj}{j}(-2)^j3^{n-(k+1)j}}\\ &\quad\,\,\color{blue}{-\sum_{j=1}^{\left\lfloor\frac{n}{k}\right\rfloor} \binom{n-kj}{j-1}(-2)^{j-1}3^{n+1-(k+1)j}[[n\geq k]]}\tag{4.5}\\ \end{align*}


  • In (4.1) we use the geometric series expansion.

  • In (4.2) we use the linearity of the coefficient of operator and apply the rule $[x^{p-q}]A(x)=[x^p]x^qA(x)$. We also set the upper limit of the series to $n$, since other values of $j$ do not contribute.

  • In (4.3) we change the order of summation $j\to n-j$.

  • In (4.4) we use the binomial theorem and note that powers of $x$ are multiples of $k$, so we take only corresponding summands $k$: $j\to kj$.

  • In (4.5) we select the coefficient of $x^{kj}$ and use Iverson brackets.

Number of occurrences of $a$ (part I):

Next we want to calculate the number of occurrences of the letter $a$ in $W_k(x)$. In order to do so, we mark in (4) the letter $a$ with $q$ and obtain the generating function \begin{align*} \color{blue}{W_k(x;q)}&=\frac{1}{1-(q+2)x+\frac{(qx)^k(1-qx)}{1-(qx)^k}}\\ &\,\,\color{blue}{=\frac{1-(qx)^k}{1-(q+2)x+2q^kx^{k+1}}}\tag{5} \end{align*} where $q$ in $(q+2)$ marks the letter $a$ and the summand $2$ respresents the occurrence of the other letters $b$ and $c$.

Before we go on it's time for a check.

Plausibility check $W_3(x;q)$:

We take as small plausibility check $k=3$, consider the bad word $\mathcal{V}=\{aaa\}$ and obtain with some help of Wolfram Alpha: \begin{align*} W_3(x;q)&=\frac{1-(qx)^3}{1-(q+2)x+2q^3x^4}\\ &=1+(q+2)x+(q^2+4q+4)x^2\\ &\qquad+(6q^2+\color{blue}{12}q+8)x^3\\ &\qquad+(4q^3+24q^2+32q+16)x^4+\cdots\tag{6} \end{align*} Here we have for instance as coefficient of $x^3$ the expression $$\left.\left(6q^2+\color{blue}{12}q+8\right)\right|_{q=1}=26$$ which gives all valid words of length $3$, namely all $3^3=27$ three-letter words from $\mathcal{V}=\{a,b,c\}$ minus the one bad word $aaa$. The blue marked term $\color{blue}{12}$ indicates we have $12$ valid words containing exactly one $a$, which are \begin{align*} &abb\quad abc\quad acb\quad acc\\ &bab\quad bac\quad cab\quad cac\\ &bba\quad bca\quad cba\quad cca\\ \end{align*}

Number of occurrences of $a$ (part II):

The number of occurrences of $a$ in valid words of length $n$ can be calculated as already indicated by OP as the coefficient of $x^n$ of \begin{align*} \left.\frac{\partial}{\partial q}W_k(x;q)\right|_{q=1} &=\left.\frac{\partial}{\partial q}\left(\frac{1-(qx)^k}{1-(q+2)x+2q^kx^{k+1}}\right)\right|_{q=1}\\ &=\left(-\frac{kx(qx)^{k-1}}{1-(q+2)x+2q^kx^{k+1}}\right.\\ &\qquad\quad\left.\left.-\frac{(2kq^{k-1}x^{k+1}-x)\left(1-(qx)^k\right)}{\left(1-(q+2)x+2q^kx^{k+1}+\right)^2}\right)\right|_{q=1}\\ &=\frac{x-kx^k+(k-1)x^{k+1}}{\left(1-3x+2x^{k+1}\right)^2}\tag{7} \end{align*}

Another plausibility check: We take $k=3$ in (7) and get with some help of WA: \begin{align*} \frac{x-3x^3+2x^{4}}{\left(1-3x+2x^4\right)^2} =x+6x^2+24x^3+\color{blue}{92}x^4+332x^5+\cdots \end{align*} We look at the blue marked value $\color{blue}{92}$ and we see, the coefficient $(4q^3+24q^2+32q+16)$ of $x^4$ in (6) gives \begin{align*} 3\cdot 4+2\cdot 24+1\cdot 32 = 92 \end{align*} as expected.

Coefficient extraction: We obtain from (7) similarly as we did in (4) for $n\geq 1$: \begin{align*} \color{blue}{[x^n]}&\color{blue}{\left.\frac{\partial}{\partial q}W_k(x;q)\right|_{q=1}}\\ &=[x^n]\frac{x-kx^k+(k-1)x^{k+1}}{\left(1-x\left(3-2x^k\right)\right)^2}\\ &=\left([x^{n-1}]\left(1+(k-1)x^k\right)-k[x^n]x^k\right)\sum_{j=0}^{\infty}\binom{-2}{j}(-x)^j\left(3-2x^k\right)^j\\ &=\sum_{j=0}^{n-1}(j+1)[x^{n-1-j}]\left(3-2x^k\right)^j\left(1+(k-1)x^k\right)\\ &\qquad-k\sum_{j=0}^n(j+1)[x^{n-j}]\left(3-2x^k\right)^jx^k\\ &=\sum_{j=0}^{n-1}(n-j)[x^{j}]\left(3-2x^k\right)^{n-j-1}\left(1+(k-1)x^k\right)\\ &\qquad-k\sum_{j=0}^n(n-j+1)[x^{j}]\left(3-2x^k\right)^{n-j}x^k\\ &=\sum_{j=0}^{\left\lfloor\frac{n-1}{k}\right\rfloor}(n-kj) [x^{kj}]\sum_{l=0}^{n-1-kj}\binom{n-1-kj}{l}(-2)^l3^{n-1-kj-l}x^{kl}\\ &\quad+(k-1)\sum_{j=0}^{\left\lfloor\frac{n-1}{k}\right\rfloor}(n-kj)\\ &\qquad\quad\cdot[x^{kj}]\sum_{l=0}^{n-1-kj}\binom{n-1-kj}{l}(-2)^l3^{n-1-kj-l}x^{k(l+1)}\\ &\quad-k\sum_{j=0}^{\left\lfloor\frac{n}{k}\right\rfloor}(n-kj+1) [x^{kj}]\sum_{l=0}^{n-kj}\binom{n-kj}{l}(-2)^l3^{n-kj-l}x^{k(l+1)}\\ &\color{blue}{=\sum_{j=0}^{\left\lfloor\frac{n-1}{k}\right\rfloor}(n-kj)\binom{n-1-kj}{j}(-2)^j3^{n-1-(k+1)j}}\\ &\quad\color{blue}{+(k-1)\sum_{j=1}^{\left\lfloor\frac{n-1}{k}\right\rfloor}(n-kj) \binom{n-1-kj}{j-1}(-2)^{j-1}3^{n-(k+1)j}[[n\geq k]]}\\ &\quad\,\,\color{blue}{-k\sum_{j=1}^{\left\lfloor\frac{n}{k}\right\rfloor}(n-kj+1) \binom{n-kj}{j-1}(-2)^{j-1}3^{n+1-(k+1)j}[[n\geq k+1]]}\\ \end{align*}

  • 94,265
  • 6
  • 88
  • 219
  • How do we get to the [underlying pdf of this](http://arxiv.org/abs/math/9806036)? – user2661923 Dec 27 '20 at 17:55
  • Hi @Markus Scheuer thanks i calculated the gf W(x,q) once again and got: $W(x,q)=\frac{1-q^kx^k}{1-(q+2)x+2q^k-x^{k+1}}$ by doing the same approach as in part B. (which is different from what you got).In 4 it should be $1-x^k$ right? My problem is that i need to find the asymptotic of the coeffinient $[x^n]W(x,1)$ and the asymptotic of $[x^n]d/dq W(x,q)_{q=1}$ then to find the proportion of tye two which will be the expactation – user726608 Dec 27 '20 at 19:13
  • 1
    @user726608: If we agree on (6) the coeff. of $x^4$ is $92$, but $[x^4](x-3x^3)/(1-3x+2x^4)^2=90$. – epi163sqrt Dec 28 '20 at 08:18
  • (It looks like i had an error, though i did many times). Still I am not getting the the coefficient of the gf $W_k(x)$, and it is clear that it is not trivial to analyze the asymptotic of $d/dq W_k(x,q)|_{q=1}$ . Do you have an idea of how to analyze the asymptotic of the expecatation (with the given statistic)? Because this where I am lost! @Markus Scheuer and thanks for the help:) – user726608 Dec 28 '20 at 11:47
  • 1
    @user726608: You're welcome and you're right, such calculations *are* error-prone. Regrettably I'm not aware of an asymptotic approach to this problem. Do you agree with (7)? Where are you stuck? – epi163sqrt Dec 28 '20 at 12:52
  • Yes, i agree with (7). I am stuck in calculating the coefficient of the gf in (4) @Markus Scheuer – user726608 Dec 28 '20 at 14:02
  • 1
    @user726608: I've added some info between (3) and (4) which might be helpful. Regards, – epi163sqrt Dec 28 '20 at 16:13
  • Hi @Markus Scheuer by the coefficients which are quite not simple, can we conclude something about the asymptotis? – user726608 Dec 30 '20 at 13:36
  • 1
    @user726608: I don't think it's easy to derive asymptotic information from this complicated coefficient representation. I'd rather think you should find out information about the dominant singularity of the generating functions. Regrettably I'm not aware of it for general $k$. You might want to check *[Analytic Combinatorics](http://algo.inria.fr/flajolet/Publications/ch4567.pdf)* by P. Flajolet for relevant information. Btw. I'm also interested to find out what's asymptotically going on here. – epi163sqrt Dec 30 '20 at 13:50