Another form of this question is: Does there exist a gap-1 square-free infinite word using the alphabet {A,B,C,D}?

Normally square-free in this context means that there are no sub-words twice in a row that follows the pattern XX. For example the words AA, CABABC, ABCABC all aren't square-free. Words that are gap-1 square-free not only avoids sub-words twice in a row but also avoids identical sub-words that are one letter apart for example ABA DCABCAD ABCDABC are all square-free but not gap-1 square-free. All of the patterns that gap-1 square free avoids are XX, XAX, XBX, XCX, XDX.

It can be shown by exhaustion that there are finitely many words using the alphabet {A,B,C} That are gap-1 square-free. This is the entire list: A, AB, ABC, ABCA, AC, ACB, ACBA, B, BA, BAC, BACB, BC, BCA, BCAB, C, CA, CAB, CABC, CB, CBA, CBAC

A reduction of the problem can be done by first establishing a correspondence between words using the alphabet {A,B,C,D} and four-vertex directed graphs. Where the four vertices are labeled A,B,C,D with $N-1$ edges. Where $N$ is the number of letters in the word that the graph corresponds to. The tail of first edge in the graph starts with the vertex whose label is the same as the first letter in the Word. The head of the first edge ends with the vertex whose label is the same as the second letter in the word. In general the tail of the $k^{th}$ edge corresponds to $k^{th}$ letter of the word and the head of the $k^{th}$ edge corresponds to the $k+1^{th}$ letter. Where $1\leq k\leq N-1$. For example the word ABC would correspond to the four-vertex two-edge graph. Where the first edge has tail on vertex A and head on vertex B. The second edge has tail on vertex B and head on vertex C. All of these corresponding graphs cannot have loops because all words are gap-1 square-free. The tail of the second edge must start at the head of the first edge. The head of the second edge cannot end at the tail of the first edge because the words are gap-1 square free. With these conditions on the graphs there is only one non-isomorphic graph with two edges. This implies that the set of all words that are length three or greater that do not start with ABC are isomorphic to the set of words that start with ABC. So if there are no infinite words it is sufficient to show that no words that start with ABC are infinite.

here are a couple of links that might be useful:

Does there exist an infinite number string without any 'refrain'?


EDIT: I wrote a computer program to find long words that are gap-1 square-free using the A,B,C,D alphabet. There exists words of this type that are more than $10^6$ letters long.

EDIT $2$: There exists an infinite gap-1 square-free word using the alphabet {A,B,C,D,E,F}. Proof:

Let $I$ be an infinite square-free word using the alphabet {A,B,C}. (There are examples of these in the links I have already provided.) Let $I^*$ be defined follows: $I^*$ is obtained by replacing every instance of A in $I$ with AD every instance of B with BE and every instance of C with CF. I will call the AD, BE, CF pairs "blocks". Let $S_1$ be the of letters {A,B,C} and let $S_2$ be the set of letters {D,E,F}. Let $x_1$ and $x_2$ be sub-words of $I^*$ that are of the same length and are adjacent or are one letter apart. When comparing the letters of the two words to see if they are the same there are two cases. In the first case the blocks of $x_1$ and $x_2$ could be out of alignment. For example if $x_1$ is ADB and $x_2$ is ECF. The first two letters of $x_1$ is in one block and the last letter is in a second block where as the first letter of $x_2$ is in one block and the last two letters are in a second block. If the blocks are out of alignment none of the letters between $x_1$ and $x_2$ will match each other because the letters will be in opposing sets. (If the first letter of $x_1$ is in $S_1$ then the first letter of $x_2$ will be in $S_2$, and so on.) In the second case (if the blocks are aligned), The first letter of a block in $x_1$ will match the first letter of the corresponding block in $x_2$ if and only if the second letter of the same block in $x_1$ will match the second letter of the corresponding block in $x_2$. So $x_1$ will match $x_2$ if and only if the blocks of $x_1$ match the blocks of $x_2$. In case 2 the blocks of $x_1$ must be adjacent to the blocks of $x_2$. So it is enough to show that the blocks of $I^*$ are square-free. There is a direct correspondence between the blocks of $I^*$ and the letters of $I$. $I$ is by definition square-free. QED

  • 2,210
  • 2
  • 14
  • 22
  • Have you read chapter 3 of Lothaire: Algebraic Combinatorics on Words? Proposition 3.1.2 states that the pattern $XYX$ is unavoidable. But you have the restriction of Y to length one or the four patterns you provide with only one variable and one constant each. Flipping through the chapter I have not seen any treatment of patterns with constants. But maybe reading it can give you some inspirations anyway. – Peter Leupold Nov 10 '18 at 21:26
  • when you say "there are no sub-words twice in a row that avoids the pattern XX" don't you mean "there are no sub-words twice in a row that follows the pattern XX"? If you say "avoids" then this comes to a double negation, and perhaps not what was intended? – Mirko Sep 12 '19 at 03:27
  • @Mirko yes I think you are right I'll edit the post accordingly – quantus14 Sep 12 '19 at 03:46

3 Answers3


Okay, I apologize for some of my previous sloppy attempts. This time I’m pretty sure my answer is correct.

In this related post, it is asked whether there exists an infinite string of three characters $1,2,3$ that avoids patterns of the form $XX$. There do exist such sequences, and the top-voted answer shows how to construct such a sequence


by repeatedly applying the transformation $1\mapsto 123$, $2\mapsto 13$, and $3\mapsto 2$. Let’s call this sequence $\mathcal{S}$.

Now we’re going to apply a complicated transformation to $\mathcal{S}$ to construct a sequence satisfying your constraints. Define some subsequences as follows:

$$U_1=ABCDB,\space V_1=ABDCB$$ $$U_2=ACDBC,\space V_2=ACBDC$$ $$U_3=ADBCD,\space V_3=ADCBD$$

Now take the sequence $\mathcal{S}$ and proceed as follows. Replace the odd-numbered terms (starting with the first term) with $U_i$, where $i$ is the actual term of the sequence $\mathcal{S}$, and replace the even-numbered terms with $V_i$. You will get something looking like

$$U_1 V_2 U_3 V_1 U_3 V_2 U_1 V_2 U_3 V_2 U_1 V_3 ...$$

Now replace $U_i$ and $V_i$ with their corresponding string of $A,B,C,$ and $D$ as defined above. This gives us the string

$$ABCDB \space ACBDC \space ADBCD \space ABDCB \space ADBCD \space ACBDC \space ABCDB\space ...$$

Call this string $\mathcal{S}^{*}$.

We can check case-by-case that none of the simple concatenations $U_i V_j$ or $V_i U_j$ for $i\ne j$ results in a “forbidden pattern” of the form $XX$, $XAX$, $XBX$, $XCX$, or $XDX$. Also, the second and fifth characters of each five-block chunk of this sequences uniquely determine which character ($1$,$2$, or $3$) that chunk corresponds to in $\mathcal{S}$. In other words, either of $\mathcal{S}^{*}[5n+1]$ and $\mathcal{S}^{*}[5n+4]$ uniquely determines the value of $\mathcal{S}[n]$, meaning that a pattern of the form $XX$, $XAX$, $XBX$, $XCX$, or $XDX$ would imply the existence of a pattern of the form $XX$ in $\mathcal{S}$, which is impossible.

Franklin Pezzuti Dyer
  • 37,332
  • 9
  • 60
  • 145
  • Very nice proof ! I find your last paragraph a little bit too concise and not explicit enough though. If you like my alternative presentation, you can put it in your answer and I'll delete my answer – Ewan Delanoy Sep 16 '19 at 10:19

This is not another answer, but is too long for a comment, and is an alternative presentation of the last paragraph in Franklin Pezzuti Dyer's accepted answer (which I find too concise and not explicit enough).

We wish to show that ${\cal S}^*$ has no forbidden pattern.

It suffices to show that ${\cal S}^*$ has no forbidden pattern of odd length, because it has obviously no forbidden pattern of length $2$, and any forbidden pattern of even length $>2$ contains a smaller forbidden pattern of odd length (remove the rightmost character).

So assume that $I_1xI_2$ is a forbidden pattern in ${\cal S}^*$, with $x\in\{A,B,C,D\}$ and $I_1$ and $I_2$ are identical subwords in ${\cal S}^*$. The cases where the length of $I_1$ is $\leq 4$ are seen to be impossible by hand. So $I_1$ must have length $\geq 5$ and it particular it contains an $A$.

Denote by $B_0,B_1,B_2,\ldots ,B_r$ the successive blocks in ${\cal S}^*$ that intersect $I_1$. Then we can write $I_1=B'_0B_1B_2\ldots B_{r-1}B'_r$ where $B'_0$ is a (possibly empty) suffix of $B_0$ and $B'_{r}$ is a (possibly empty) prefix of $B_r$. Similarly, we can write $I_2=C'_0C_1C_2\ldots C_{r-1}C'_r$ with obvious notation. Now, because $A$ appears at the beginning of each block and only there, this forces $I_1$ and $I_2$ to be identical "blockwise", in other words $C'_0=B'_0, C'_r=B'_r$, and $C_k=B_k$ for $1\leq k \leq r$.

Now we know that the successive blocks that intersect $I_1xI_2$ are $B_0,B_1,B_2,\ldots ,B_{r-1},B'_rxB'_0,B_1,\ldots,B_{r-1},C_r$. Since the middle block $M=B'_rxB'_0$ has length $5$, one of the two subwords $B'_r$ or $B'_0$ has length $\geq 2$, and then this subword characterizes the block it belongs to, so that either $M=B_0$ or $M=C_r$. In both cases we deduce a square in $\cal S$ which is impossible.

Ewan Delanoy
  • 58,700
  • 4
  • 61
  • 151

My short answer is there is, because:

  • Word1: ABCD
  • Word2: ABCDABCD


This always works because if XYX occurs, two letters which have a gap of one letter between them are the same. Here, the only sets of two letters which have a gap of one letter between them are:

AC,BD,CA,and DB.

Math Bob
  • 271
  • 3
  • 13
  • 1
    X here doesn't have to denote a single letter: "For example the words AA, CABABC, ABCABC all aren't square-free." So ABCDABCD is just (ABCD)(ABCD), an instance of the first forbidden form XX. – Noah Schweber Feb 21 '19 at 23:28