Let's say we are talking about addition defined in the real numbers. Then, by induction we define $\sum_{i=0}^{0}a_i=a_0$ and $\sum_{i=0}^{n}a_i=\sum_{i=0}^{n-1}a_i+a_n$ for $n> 1.\:$

Now, how do you define general associativity? I know that this has something to do with the fact that $\sum_{i=0}^{n}=\sum_{i=0}^{k}+\sum_{i=k+1}^{n}$ for $0\leq k<n$, being $\sum_{i=k}^{n}a_i=\sum_{i=0}^{n-k}a_{i+k}$ by definition. But the thing is that this doesn't quite define the notion of different ways of arranging brackets, like for example $(a_0+(a_1+a_2))+((a_3+a_4)+(a_5+a_6))$.

So my question is how they formally define this process of bracketing. Think of the case when someone just tell you to prove the general associativity for the real numbers. How do they actually define this property in order to prove it? is it necessary to have one?

Look at for example this proof, specifically at the point where professor M. Zuker says: "Let us now assume that any bracketing of $a_1, a_2,...,a_k$ equals the standard form for $1\leq k\leq n-1$, where $n>3$". But, then again my question: what is the definition of bracketing? is it actually necessary to have a definition of bracketing or this proof works whatever the definition of bracketing is?

Also I have found this paper by William P. Wardlow - A generalized general associative law- that contains different proofs of this general associativity law. The first one, the one that he suggests as his favorite, is done by Nathan Jacobson in his book "Lectures of Abstract Algebra" Vol. 1 page 20. Looking at this proof there is one point where he says "Consider now any product associated with $(a_1, a_2,..., a_n)$...", which means "any bracketing associated with...". Then again the same question.

enter image description here enter image description here

I hope you understand my point. If not, please fell free of asking anything related to my question.


For clarification let's say we are talking about sum in the real numbers. Then,

1.- $(...(((a_0+a_1)+a_2)+a_3)...+a_n)$ is a representation of the formal definition by recursion, meaning $\sum_{i=0}^{n}a_i$ just as defined above: $\sum_{i=0}^{0}a_i=a_0$ and $\sum_{i=0}^{n}a_i=\sum_{i=0}^{n-1}a_i+a_n$ for $n> 1$.

2.- What is the formal definition of $a_0+(a_1+(a_2+...+(a_{n-1}+a_n)...)$ ?

3.- What is the formal definition of something like $(a_0+((a_1+a_2)+a_3))+(((a_4+a_5)+a_6)+....+(a_{n-1}+a_n))$?

Daniela Diaz
  • 3,788
  • 28
  • 50
  • Usually it is just stated as an axiom. $\forall A,B,C((A*B)*C=A*(B*C))$. – JustAskin Mar 12 '14 at 07:33
  • 1
    @Justaskin_ Yes. The associativity property is stated as an axiom for any three objects, but about when you need to extend it to an arbitrary $n$. How do you define "general associativity" if someone tells you "prove the general associativity property". – Daniela Diaz Mar 12 '14 at 07:38
  • 1
    Have you considered proof by induction? You already have your base case (it's an axiom), now assume it holds for $n$ elements and prove that it holds for $n+1$. – JustAskin Mar 12 '14 at 10:28
  • 1
    @Justakin Well, that's exactly what everybody does and I don't understand this step. What is the meaning of saying "lets say that associativity holds for n"? what is "associativity for n"? – Daniela Diaz Mar 12 '14 at 15:44
  • 1
    @seaturtles Yes, this works for $4$ elements. How do you say this fact when it comes to an arbitrary number $n$ of elements?. Is it necessary to have one definition to actually prove the fact that different bracketings yield to the same result? – Daniela Diaz Apr 17 '14 at 00:40
  • 1
    So, you actually *do* understand what general associativity is, you just don't know how to say what it is? – blue Apr 17 '14 at 01:31
  • @seaturtles Exactly!. Whenever I ask someone to prove general associativity they say that the proof is by induction and they take for granted that the general associativity is valid for $n$ and then they prove that it is valid for $n+1$ but they never formally state what general associativity means (again formally) and that's what I'd like to know. – Daniela Diaz Apr 17 '14 at 01:39
  • 1
    I think it's a very good question, +1. You could certainly define an abstract bracketing as a rooted planar binary tree. Maybe this really overstating it, but "general associativity" appears as some kind of coherence theorem, not unlike the coherence theorem of Mac Lane. – Olivier Bégassat Apr 17 '14 at 01:49
  • 1
    I don't have time right now to give this question much attention (and it looks like several others do), but maybe my 12 September 2006 sci.math post [combinatorics of associativity](http://mathforum.org/kb/message.jspa?messageID=5123761) will be of interest. If nothing else, you'll find a lot of literature references there. – Dave L. Renfro Apr 17 '14 at 20:42

7 Answers7


This is a really good question and it comes up (among other places) in tensor category theory, where equality is replaced by canonical isomorphism and suddenly things cannot be handwaved away... I'll try to give an answer; unfortunately it is lacking in illustrations (due to math.stackexchange not having packages like xypic) and examples (due to my laziness). Sorry also for its length...

I will write $R$ for $\mathbb R$ because really there is nothing specific about $\mathbb R$ that you are using here (and, honestly, because I am too lazy to type in the \mathbb part all the time). Here are two ways of stating general associativity in $R$:

First way: Here is a mild variation on the way to state "general associativity" that you have suggested: We can define, for every positive integer $n$, a map $\operatorname{sum}_n : R^n \to R$; we define it by induction over $n$:

  • The map $\operatorname{sum}_1 : R^1 \to R$ is defined as the map sending each $\left(r\right) \in R^1$ to $R$.

  • If $\operatorname{sum}_n : R^n \to R$ is defined for some $n$, then $\operatorname{sum}_{n+1} : R^{n+1} \to R$ is defined by $\operatorname{sum}_{n+1} \left(a_1,a_2,...,a_{n+1}\right) = \operatorname{sum}_n \left(a_1,a_2,...,a_n\right) + a_{n+1}$.

Then, "general associativity" states that whenever $n$ and $k$ are integers such that $0 < k < n$, and whenever $\left(r_1,r_2,...,r_n\right)\in R^n$ is an $n$-tuple, we have

$\operatorname{sum}_n \left(r_1,r_2,...,r_n\right) = \operatorname{sum}_k \left(r_1,r_2,...,r_k\right) + \operatorname{sum}_{n-k} \left(r_{k+1},r_{k+2},...,r_n\right)$.

So you don't like this formulation, because rather than explicitly formalizing the concept of a bracketing, it only formalizes its "first step" (i.e., it allows "moving the outermost brackets"). Before I continue, let me say that for most applications this is perfectly enough, as you can use induction instead of referring to some obscure notion of "bracketings" and your proofs will likely only gain in clarity. But sometimes one really has to go the final mile. This is more complicated:

Second way: The best way to formalize the concept of a bracketing (for a binary operation) is the notion of a binary tree. The Wikipedia page on this is quite confusing, and the other relevant pages on the Wikipedia (associahedron, Tamari lattice) are not very elementary, so let me try to provide a definition. The simplest way to define an (unlabelled) binary tree with $n$ leaves (for $n$ a positive integer) is to say that:

  • the empty set $\emptyset$ is a binary tree with $1$ leaf, called the empty binary tree (don't think of the leaves as any actual objects -- all we care about here is the number of leaves of a tree, which is just a number that we keep track of);

  • if $A$ and $B$ are two binary trees with $a$ and $b$ leaves, respectively, then the pair $\left(A,B\right)$ is a binary tree with $a+b$ leaves; this tree is said to have left child $A$ and right child $B$ (bad terminology if you ask me, since these children are used to create the tree...).

This is an inductive definition, meaning that we consider all objects which are forced to be binary trees by these axioms to be binary trees, and no others.

So each of $\emptyset$, $\left(\emptyset, \emptyset\right)$, $\left(\left(\emptyset, \emptyset\right), \emptyset\right)$, $\left(\emptyset,\left(\left(\emptyset,\left(\emptyset,\emptyset\right)\right),\emptyset\right)\right)$ is a binary tree. The triple $\left(\emptyset,\emptyset,\emptyset\right)$ is not a binary tree, because we have no axiom that could make a triple a binary tree.

(Caveat lector: Binary trees are not trees in the sense of graph theory, and actually not trees in the sense of computer science either. Normally, "trees" are not allowed to be empty, and there is no distinction between left and right children. Here the empty tree exists and so does the left-right distinction: (for example) the binary trees $\left(\left(\emptyset, \emptyset\right), \emptyset\right)$ and $\left(\emptyset,\left(\emptyset, \emptyset\right)\right)$ are not the same.)

Of course, binary trees are so called because they can be drawn as trees. To draw the empty binary tree, draw a single dot, which is called the "root" of the tree. To draw a tree of the form $\left(A,B\right)$, you need to draw a dot, which stands for the "root" of this tree, and then one edge from this "root" that goes southwest and another edge from this "root" that goes southeast. Now, draw the tree $A$ (this is a recursive algorithm) in such a way that its "root" is placed at the end of the southwest edge. Also, draw the tree $B$ in such a way that its "root" is placed at the end of the southeast edge.

Sadly I cannot draw on here, but if you look at this image and pretend that $\alpha$, $\beta$, $\gamma$ and the circles are dots, then the left binary tree is $\left(\emptyset, \left(\emptyset, \emptyset\right)\right)$, and the right binary tree is $\left(\left(\emptyset, \emptyset\right), \emptyset\right)$. Also, in this picture (think of the halfmoons/bananas as dots too) you see all five possible binary trees with $4$ leaves. (Yes, the "leaves" are the halfmoons/bananas; they are dots from which no edges go down.)

Now, to any nonnegative integer $n$ and any binary tree $T$ with $n$ leaves, we can assign a map $s_T : R^n \to R$; it is defined (recursively) as follows:

  • If $T$ is the empty binary tree (so that $n=1$), then $s_T : R^1 \to R$ is the map sending each $\left(r\right)$ to $r$.

  • If $T$ has the form $\left(A,B\right)$ for two binary trees $A$ and $B$ having respectively $a$ and $b$ leaves, then we already have maps $s_A : R^a \to R$ and $s_B : R^b \to R$ (this is an inductive definition), and we need to define a map $s_T : R^{a+b} \to R$. Define it by $s_T\left(r_1,r_2,...,r_{a+b}\right) = s_A\left(r_1,r_2,...,r_a\right) + s_B\left(r_{a+1},r_{a+2},...,r_{a+b}\right)$.

With the maps $s_T$ thus defined, "general associativity" says that for any fixed positive integer $n$, the maps $s_T : R^n \to R$ for all possible binary trees $T$ with $n$ leaves are the same.

Let us see what this means for $n = 1$. There is only one binary tree with $1$ leaf, namely the empty tree, and the corresponding map sends each $\left(r_1\right) \in R^1$ to $r_1$. So the statement we are making is trivial for $n = 1$.

Let us see what "general associativity" means for $n = 2$. There is only one binary tree with $2$ leaves, namely the tree $\left(\emptyset,\emptyset\right)$, and the corresponding map sends each $\left(r_1,r_2\right) \in R^2$ to $r_1 + r_2$. Again the statement is trivial.

The first nontrivial consequence is what we get for $n = 3$. There are two binary trees with $3$ leaves, namely $\left(\emptyset, \left(\emptyset, \emptyset\right)\right)$ and $\left(\left(\emptyset, \emptyset\right), \emptyset\right)$. The $s_T$ for the first of these trees is the map sending each $\left(r_1,r_2,r_3\right) \in R^3$ to $r_1 + \left(r_2+r_3\right)$. The $s_T$ for the second of these trees is the map sending each $\left(r_1,r_2,r_3\right) \in R^3$ to $\left(r_1+r_2\right) + r_3$. So "general associativity" for $n=3$ yields that $r_1 + \left(r_2+r_3\right) = \left(r_1+r_2\right) + r_3$ for all $r_1,r_2,r_3 \in R$. This is just the usual associativity law.

For $n = 4$, there are five binary trees with $4$ leaves, which I list here along with the corresponding maps $s_T$:

$\left(\emptyset, \left(\emptyset, \left(\emptyset,\emptyset\right)\right)\right)$, with $s_T$ sending $\left(r_1,r_2,r_3,r_4\right) \in R^4$ to $r_1+\left(r_2+\left(r_3+r_4\right)\right)$;

$\left(\emptyset, \left(\left(\emptyset,\emptyset\right), \emptyset\right)\right)$, with $s_T$ sending $\left(r_1,r_2,r_3,r_4\right) \in R^4$ to $r_1+\left(\left(r_2+r_3\right)+r_4\right)$;

$\left(\left(\emptyset, \emptyset\right), \left(\emptyset,\emptyset\right)\right)$, with $s_T$ sending $\left(r_1,r_2,r_3,r_4\right) \in R^4$ to $\left(r_1+r_2\right)+\left(r_3+r_4\right)$;

$\left(\left(\emptyset, \left(\emptyset,\emptyset\right)\right), \emptyset\right)$, with $s_T$ sending $\left(r_1,r_2,r_3,r_4\right) \in R^4$ to $\left(r_1+\left(r_2+r_3\right)\right)+r_4$;

$\left(\left(\left(\emptyset,\emptyset\right), \emptyset\right), \emptyset\right)$, with $s_T$ sending $\left(r_1,r_2,r_3,r_4\right) \in R^4$ to $\left(\left(r_1+r_2\right)+r_3\right)+r_4$.

So "general associativity" for $n=4$ says that

$r_1+\left(r_2+\left(r_3+r_4\right)\right) = r_1+\left(\left(r_2+r_3\right)+r_4\right) = r_1+\left(\left(r_2+r_3\right)+r_4\right) = \left(r_1+\left(r_2+r_3\right)\right)+r_4 = \left(\left(r_1+r_2\right)+r_3\right)+r_4$.

In general, the "bracketings" of $n$ symbols using a fixed binary operation correspond to the binary trees with $n$ leaves.

Now this got damn long and nowhere near readable. Can any LaTeX wizard step in and suggest a simple drawing package that does work on math.stackexchange? It doesn't take much to draw a tree, and it should clear up a lot...

EDIT: Here is a third way, which really is completely equivalent to the second way, but has the advantage of a less confusing definition. The idea is to use Dyck words instead of trees. (The Wikipedia page on Catalan numbers actually provides a good synopsis of what I am saying -- and some good pictures of binary trees. It calls "full binary tree" what I call "binary tree".)

Third way: What is a Dyck word? In combinatorics, "word" is just a fancy word (oops) for "tuple" (usually finite), and the "letters" of the word simply mean the entries of the tuple. Unless there is risk of confusion with actual products (or numbers), one abbreviates a tuple (or word) $\left(a_1,a_2,...,a_k\right)$ as $a_1a_2...a_k$. So the word $\left(0,1,1,1,0\right)$ is written as $01110$.

Now, let $n \in \mathbb{N}$. A Dyck word of length $2n$ is defined to be a word whose letters are $0$'s and $1$'s, each appearing $n$ times in total (so the word has altogether $2n$ letters), such that for every $i$, there are at least as many $0$'s among the first $i$ letters as there are $1$'s among them. So, for example, $0110$ is not a Dyck word, because among the first $3$ letters there are fewer $0$'s than $1$'s (namely, one $0$ and two $1$'s). Also, $10001$ is not a Dyck word, because among the first $1$ letter there are fewer $0$'s than $1$'s (no $0$ at all and one $1$). Also, $0100$ is not Dyck because its total number of $0$'s does not equal its total number of $1$'s. But $001011$ is a Dyck word (of length $2\cdot 3=6$), as you can easily check: read the word from left to right, keeping track of how many $0$'s and how many $1$'s you have encountered; if the count for $1$'s overtakes the count for $0$'s, then your word is not Dyck; if both counts are equal at the end, then it is Dyck; otherwise it is not Dyck.

(Wikipedia uses the two letters "X" and "Y" instead of $0$ and $1$; other than this, there is no difference.)

Now, to any nonnegative integer $n$ and any Dyck word $w$ of length $2n$, we can assign a map $s_w : R^{n+1} \to R$; it is defined (recursively) as follows:

  • If $n = 0$ (so that the word $w$ is empty -- a $0$-tuple), then $s_w : R^1 \to R$ is the map sending each $\left(r\right)$ to $r$.

  • Otherwise, let $j$ be the smallest positive $i$ such that the number of $0$'s among the first $i$ letters of $w$ equals the number of $1$'s among the first $i$ letters of $w$. (Such an $i$ exists, because $i = 2n$ does the trick; thus, the smallest such $i$ also exists.) Notice that this $j$ must be even, because otherwise the number of $0$'s among the first $j$ letters of $w$ could not equal the number of $1$'s among these letters for parity reasons. Write $w$ as $w_1w_2...w_{2n} = \left(w_1,w_2,...,w_{2n}\right)$. (Then, it is easy to see that $w_1 = 0$ and $w_j = 1$.) Let $u$ be the word $w_2w_3...w_{j-1}$, and let $v$ be the word $w_{j+1}w_{j+2}...w_{2n}$. (It is easy to see that both $u$ and $v$ are Dyck words.) Then, we define a map $s_w : R^{n+1} \to R$ by $s_w\left(r_1,r_2,...,r_{n+1}\right) = s_u\left(r_1,r_2,...,r_{j/2}\right) + s_v\left(r_{j/2+1},r_{j/2+2},...,r_n\right)$.

With the maps $s_w$ thus defined, "general associativity" says that for any fixed positive integer $n$, the maps $s_w : R^{n+1} \to R$ for all possible Dyck words of length $2n$ are the same.

Let us see what this means for $n = 1$. We have only one Dyck word of length $2\cdot 1 = 2$, namely $01$. Let this word be $w$. Looking up the definition of $s_w$, we see that $u$ and $v$ both are the empty word. Hence, every $\left(r_1,r_2\right) \in R^2$ satisfies $s_w\left(r_1,r_2\right) = s_u\left(r_1\right) + s_v\left(r_2\right) = r_1+r_2$. Of course, "general associativity" says nothing insightful, because there is only one $w$.

Let us see what we get for $n = 2$. We have two Dyck words of length $2\cdot 2 = 4$, namely $0011$ and $0101$.

Let $w$ be the Dyck word $0011$. Then, in our above definition of $s_w$, we have $j = 4$, $u = 01$ and $v = \left(\text{empty word}\right)$. Hence, every $\left(r_1,r_2,r_3\right) \in R^3$ satisfies

$s_w\left(r_1,r_2,r_3\right) = \underbrace{s_u\left(r_1,r_2\right)}_{=r_1+r_2} + \underbrace{s_v\left(r_3\right)}_{=r_3} = \left(r_1+r_2\right) + r_3$.

Similarly, if $w$ is the Dyck word $0101$, then $s_w\left(r_1,r_2,r_3\right) = r_1 + \left(r_2+r_3\right)$. So "general associativity" for $n = 2$ claims $\left(r_1+r_2\right) + r_3 = r_1 + \left(r_2+r_3\right)$, which is exactly what you would expect (except that the $n$ here is shifted by $1$ compared to the second way above).

You asked for references. Unfortunately I also know no better than google for "bracketings and associativity", and it is very hard to find reader-friendly treatments of these formalization issues. There is Huang and Tamari's famous Problems of associativity paper, but this is not about formalizing associativity; it is about the combinatorics of the bracketings (or Dyck words or binary trees). A whole book dedicated to this kind of combinatorics (Associahedra, Tamari Lattices and Related Structures) has been published in 2012, and you can find all (or most?) of its chapters on the internet (for example, Ross Street's Parenthetic Remarks) if you are interested. Sadly, this is again on a level where questions like "how is a bracketing defined?" appear as too trivial. These foundational issues are usually ignored when they arise in elementary algebra because authors are too lazy to discuss them or do not want to expect their readers to put up with the large amount of formalism and pedantry involved in their discussion. But sometimes people care about them when they reappear in the theory of tensor categories (aka monoidal categories) because they are less easy to argue away in that situation (and the First way I showed above is often not enough for tensor categories); many nevertheless try to. MacLane, in his Natural associativity and commutativity, uses "iterates" of the associativity morphism in a category to encode our bracketings. It is not very explicit but his paper seems fairly well-written (but you have to know some category theory...).

I should also say that a few good algebra texts prove "general associativity" formalized according to the First way I sketched above, or in similar veins. See, for instance, Theorem 2 in §I.1 of Claude Chevalley, Fundamental Concepts of Algebra, 1956, which gives a slightly stronger version of the First way.

darij grinberg
  • 16,217
  • 4
  • 41
  • 85
  • Following your ideas I have arrived to the conclusion that actually whatever the definition of "any bracketing" associated with $a_0,...,a_n$ be, there are two properties that they all have in common: (1) For $0$ they all equals $a_0$ and (2) For $n>1$ they all should be of the form $s(a_0,...,a_k)+s(a_{k+1},...,a_{n})$ where $s$ is some "bracketing". These two properties are essentially what they use in most proofs by induction. Thanks so much for your answer. I would have liked that you put some references though. – Daniela Diaz Apr 21 '14 at 18:27
  • Thanks for the accept! I'll come back to this later this week hopefully; the conference right now is taking up most of my time. There is a lot of literature on bracketings and trees, though mostly not with foundational questions in mind but rather with concrete combinatorial questions such as "how does the graph on all possible bracketings (aka binary trees) look like if two bracketings are connected to each other by an edge if one can get from one to the other by one single application of $x(yz) = (xy)z$?". See e.g. the beginning (§2.1-2.2) of http://arxiv.org/abs/1109.5296 . – darij grinberg Apr 22 '14 at 06:05
  • 1
    Edited with *some* references added in; I am disappointed by the lack of really relevant ones myself. Also, I fixed my examples of trees since there were some mistakes in them; sorry! – darij grinberg May 04 '14 at 06:16

Here is a way to formally define the process of bracketing and a proof that associativity imply general associativity.

Given a context-free grammar $S\to\bullet,\,S\to(SS)$, which generates all sentences with matching brackets in expressions of binary operators: $$\bullet,(\bullet\bullet),(\bullet(\bullet\bullet)),\dots$$ Let $P$ be the set of all these sentences. Then $P$ is a free magma with the operation $(p,q)\mapsto(pq)\in P.$ That $P$ is free means that $(p\hat p)=(q\hat q)\implies p=q \wedge \hat p=\hat q$.

Define the degree of $p\in P$ as $|p|=1$ if $p=\bullet$ and $p=|p_1|+|p_2|$ if $p=(p_1 p_2)$. The degree of $p$ is the number of occurrences of the sign $\bullet$ in $p$. Let $P_n=\{p\in P|\;\;|p|=n\}$, $n>0$. Also define $l_n\in P_n$ by $\;l_1=\bullet\;$ and $\;l_{n+1}=(l_{n}\bullet)\;$ for $n>1$.

Given a magma $(M,\cdot)$ and elements $x_1,\dots,x_n\in M$. If $\;p\in P_n$ then $p$ can be applied to $x_1,\dots,x_n$ in the obvious way, by in turn from left to right replace $\bullet$ with $x_k$, $k=1,\dots, n$. This can be written $p(x_1,\dots,x_n)\in M$.

Proof of general associativity by induction. Consider the statement: $$S(n):\quad \forall x_1,\dots,x_n\in M\,\forall p,q\in P_n:p(x_1,\dots,x_n)=q(x_1,\dots,x_n)$$ If $M$ is associative then $S(3)$ is true since $P_3=\{(\bullet(\bullet\bullet)),((\bullet\bullet)\bullet)\}$. Now suppose that $S(m)$ is true for all $m<n$. For $p,q\in P_n$ there are $p^\prime,\hat p, \,q^\prime,\hat q\in P$ and $i,j>0$ such that: \begin{cases} p(x_1,\dots,x_n)=p^\prime(x_1,\dots,x_i)\cdot\hat p(x_{i+1},\dots,x_n) \\ q(x_1,\dots,x_n)=q^\prime(x_1,\dots,x_j)\cdot\hat q(x_{j+1},\dots,x_n) \end{cases} If $i=j$, then $p^\prime(x_1,\dots,x_i)=q^\prime(x_1,\dots,x_i)$ etc, because $i<n$. [In particular, $l_i(x_1,\dots,x_i)=p^\prime(x_1,\dots,x_i)$]. Suppose $i>j$, then

$p(x_1,\dots,x_n)=l_i(x_1,\dots,x_i)\cdot l_{n-i}(x_{i+1},\dots,x_n)=$ $\Big(l_j(x_1,\dots,x_j)\cdot l_{i-j}(x_{j+1},\dots,x_i)\Big)\cdot l_{n-i}(x_{i+1},\dots,x_n)=$ $l_j(x_1,\dots,x_j)\cdot\Big(l_{i-j}(x_{j+1},\dots,x_i)\cdot l_{n-i}(x_{i+1},\dots,x_n)\Big)=$ $q^\prime(x_1,\dots,x_j)\cdot\Big(l_{i-j}(x_{j+1},\dots,x_i)\cdot l_{n-i}(x_{i+1},\dots,x_n)\Big)=$ $q^\prime(x_1,\dots,x_j)\cdot l_{n-j}(x_{j+1},\dots,x_n)=$ $q^\prime(x_1,\dots,x_j)\cdot \hat q(x_{j+1},\dots,x_n)=q(x_1,\dots,x_n)\quad$ QED.

  • 13,268
  • 4
  • 23
  • 72

The meaning of “associativity of the binary operation $\cdot$ on the set $A$ holds for $k$ (items)” is (as Wardlaw writes) that “[If $a_1, a_2, \dots,a_k$ are elements of $A$, then] any bracketing of $a_1,a_2,\dots,a_k$ equals the standard form.”

When we say a binary operation is associative (without mentioning “for $k$”), we mean it’s associative for all positive integers $k$. That means we can write $\prod\limits_{i=1}^n a_i$ and know that (regardless of how big $n$ is) it’s well-defined, because the way in which we evaluate it doesn’t matter.

The standard form in the definition is (or can be taken to be) Wardlaw’s left associative product, $$\left(\dots\left(\left(a_1\cdot a_2\right)\cdot a_3\right)\dots\cdot a_k\right),$$

which is the element of $A$ obtained by finding $a_1\cdot a_2$, multiplying it by $a_3$, and so on, up to a final product with $a_k$.

For the definition to be entirely clear, one should also know what’s meant by any bracketing.

“Any bracketing” means the result of any particular sequence taken from among all the possible sequences that can be used to transform $a_1,a_2,\dots,a_k$ into an element of $A$ using $k-1$ applications of $\cdot$.

Bracketings are usually expressed with parenthesization. Example: For $k=5$, this is one of the bracketings that exist:

$$\left(\left(a_1\cdot a_2\right)\cdot \left(\left(a_3\cdot a_4\right)\cdot a_5\right)\right).$$

This is the element of $A$ obtained by applying $\cdot\,$ according to the parenthesization. (Where two applications are performed on the same line, the result doesn’t depend on the order in which those evaluations are made.)

$$ \begin{align} \left(\left(a_1\cdot a_2\right)\cdot \left(\left(a_3\cdot a_4\right)\cdot a_5\right)\right)&= \overbrace{(a_1\cdot a_2)}^{\mathrm{Let}\, v=a_1\cdot a_2} \cdot \left(\overbrace{(a_3\cdot a_4)}^{\mathrm{Let}\, w=a_3\cdot a_4}\cdot a_5\right)\\ &= v\cdot\overbrace{(w\cdot a_5)}^{\mathrm{Let}\, x=w\cdot a_5}\\ &=\overbrace{\left(v\cdot x\right)}^{\mathrm{Let}\, z=v\cdot x}\\ &=z. \end{align} $$

By the way, so long as a “standard form” for bracketing $a_1,a_2,\dots,a_k$ is well-defined, its specific form is irrelevant, since if all forms of bracketing $a_1,a_2,\dots,a_k$ are equal to the same specific one of the possible bracketings, associativity holds.

Does that help?

Steve Kass
  • 14,195
  • 1
  • 19
  • 31

To put the question in a broader perspective, it all comes down to an unfortunate notational convention.

A binary operation is just a function $f:X \times X \rightarrow X$, where $X$ is a set, EXCEPT that it has the peculiar convention of writing the function with "infix notation" like 1+2 rather than ordinary function notation like, say, "add(1,2)".

"Bracketing" is the price we pay for using infix notation, since without bracketing, expressions like 12 / 6 / 2 are ambiguous $-$ does that mean (12/6)/2, which equals 1, or 12/(6/2), which equals 4? And since we don't like to bother writing so many parentheses all the time, we even go through the trouble of establishing conventional rules about order of operations: The rules that say that 1+3*4 implicitly means 1+(3*4) and 5 - 4 - 3 means (5-4)-3. In elementary school you had to pay the price of learning all these additional rules.

If we all used ordinary function notation instead, we wouldn't need bracketing or conventions about precedence: Instead of (9 - 15) / 3 we just write $\operatorname{divide}( \operatorname{subtract}(9,15), 3)$. Instead of a complicated "bracketing" like $$(a_0+(a_1+a_2))+((a_3+a_4)+(a_5+a_6)),$$ we would just write $$\operatorname{add}(\operatorname{add}(a_0,\operatorname{add}(a_1,a_2)),\operatorname{add}(\operatorname{add}(a_3,a_4),\operatorname{add}(a_5,a_6))).$$

You can treat this as the formal definition. (You can even create an algorithm to do this kind of conversion automatically; algorithms of this kind are well-known.) In fact, to take things a little further, the parentheses and commas aren't even needed. We could just write

/ - 9 15 3
+ + a0 + a1 a2 + + a3 a4 + a5 a6

for the two examples above. There is no ambiguity. (Make sure you see why.) This kind of notation, which requires no parentheses, is called "prefix" or "Polish" notation. Again, it's really just ordinary function notation except we recognized that we don't need to bother writing out all the parentheses and commas.

(There is also "postfix" or "reverse Polish" notation, where (9-15)/3 would be expressed as 9 15 - 3 /. For the purpose of algorithm computation, postfix notation is the much more natural choice. But we'll stick to prefix since it closely corresponds to familiar function notation.)

In the following examples, the first two constitute all the possible "bracketings" of a binary operation $f$ over $a_0,a_1,a_2$; the third is not legal:

f a0 f a1 a2
f f a0 a1 a2
f a0 a1 f a2

Again, the first is equivalent to $f(a_0, f(a_1,a_2))$ and the second is equivalent to $f(f(a_0,a_1),a_2)$. The third cannot be parsed. (Aside: In general, what's an easy way to tell whether an expression with f's and a's is legal? Read the list from left to right. As you read, keep track of how many f's and how many a's you've seen so far. The running count of a's should be greater than the running count of f's precisely when you reach the end of the list, and no sooner.)

It's now clear what "any possible bracketing" means: It's any way to intersperse $n$ copies of $f$ among $a_0, a_1, ..., a_n$ (while keeping $a_0, a_1, ..., a_n$ in the same order) to form a legal expression in prefix notation.

Parentheses are just a silly way of notating what function notation does just as well $-$ no, better $-$ at expressing.

We've shown that bracketing is just an awkward consequence of using infix notation. But infix notation isn't just a load of crap as I've made it seem like. There's more to it than that. In fact, infix notation is a terrific convention, in the following sense:

In prefix notation, if I want to write $A*B*C$, I have to choose between writing either "* A * B C" or "* * A B C". But if $*$ is an associative operation, then both ways are equal. In other words, the prefix notation is now forcing me to make a distinction between two orders of evaluation that I shouldn't have to distinguish between. Notation should not force me to make irrelevant distinctions. That's the advantage of writing $A*B*C$: it leaves unspecified whether I mean $(A*B)*C$ or $A*(B*C)$, which is a good thing because the distinction is irrelevant. Thus infix notation makes sense for $*$ and for many of our everyday operations too, which are often associative.

You're trying to prove general associativity, which, as you recognized, requires an understanding of what "bracketing" means. But bracketing is a relic of infix notation, which in turn is a notation intended to hide the bracketing for associative operations! That's why thinking in terms of prefix notation is the way to go here, where (a priori) we're not dealing with a general-associative operation.

  • 3,434
  • 1
  • 15
  • 36

For any positive integer $n$ we will define a function $f_n:\mathbb{R}^n\rightarrow P(\mathbb{R})$ where $P(\mathbb{R})$ denotes the power set of $\mathbb{R}$ (the set of all subsets of $\mathbb{R}$) by recursion on $n$.

Define $f_1(a_1) = \{(a_1)\}$.

For $n\geq 2$, define $f_n(a_1,\ldots,a_n) = \bigcup_{m=1}^{n-1} \{(x+y) : x \in f_m(a_1,\ldots,a_m)$ and $y \in f_{n-m}(a_{m+1},\ldots,a_n)\}$.

Then by a "bracketing of $a_1, a_2, \ldots, a_n$" we mean any element of $f_n(a_1,\ldots,a_n)$.

Now the generalized associative law (for addition in the real numbers) states that for each integer $n \geq 3$ and for all real numbers $a_1, \ldots, a_n$, the set $f_n(a_1,\ldots,a_n)$ contains only one member.


A little old, but here's my two cents. Although darij's answer with binary trees is my favorite answer here and makes the most sense to me, I wouldn't want to introduce binary trees to prove this fact. So, I have a "different" definition of "bracketings".

First, some notation since I have to use several function compositions. Fix a natural number $n$ that's at least $1$. Suppose that $X_1, X_2, \ldots, X_n, X_{n+1}$ denotes $n+1$ sets. And suppose that $f_1, f_2, \ldots, f_n$ denote $n$ functions where $f_k:X_{n-k+2}\rightarrow X_{n-k+1}$ for every $k$ between $1$ and $n$. Then I shall define $$\prod_{k=1}^n f_k=f_1\circ f_{2}\circ\cdots\circ f_{n-1}\circ f_n\,.$$ This could be made more rigorous with recursive definitions. But moving on....

Let's continue talking about the reals (although this could be extended to any field or even any group). Suppose that $n$ denotes a natural number that's at least $2$. For every $k$ between $1$ and $n-1$, there is a function $s_k^{(n)}:\mathbb{R}^n\rightarrow\mathbb{R}^{n-1}$ that adds the $k$th coordinate and $k+1$st coordinate together and concatenates the other unchanged coordinates onto this result. In other words, $$s_k^{(n)}(a_1, a_2, \ldots, a_{k-1}, a_k, a_{k+1},a_{k+2}, \ldots, a_n)=(a_1, a_2, \ldots, a_{k-1}, a_k+a_{k+1}, a_{k+2}, \ldots, a_n)$$

Now, the generalized associative law is the same as saying that for every $n\geq 2$ and for every $k_2, k_3, k_4, \ldots, k_{n-1}, k_n$ such that $$1\leq k_j\leq j-1\text{ for every }j\text{ between }2 \text{ and }n$$ we may conclude the following equality: $$\sum_{k=1}^n a_k=\left(\prod_{j=2}^n s_{k_j}^{(j)}\right)(a_1, a_2, \ldots, a_n)\,.$$

The reader now has to convince themself in some fashion that each "valid bracketing" of $$a_1+a_2+\cdots+a_n$$ corresponds to some composition $$s_{1}^{(2)}\circ s_{k_3}^{(3)}\circ\cdots\circ s_{k_n}^{(n)}\,.$$ In general, there may more compositions than just one associated to any bracketing. To help get a grasp of what these $s_k^{(n)}$ represent, notice that $$s_1^{(2)}(a,b)=a+b\text{ for any pair of reals}\,,$$ $$s_1^{(2)}\circ s_1^{(3)}\circ\cdots\circ s_1^{(n)}(a_1, a_2, \ldots, a_n)=(\cdots(((a_1+a_2)+a_3)+a_4)+\cdots+a_n)\,,$$ $$s_1^{(2)}\circ s_2^{(3)}\circ\cdots\circ s_{n-1}^{(n)}(a_1, a_2, \ldots, a_n)=(a_1+\cdots+(a_{n-3}+(a_{n-2}+(a_{n-1}+a_n)))\cdots)\,.$$ A more concrete one: $$s_1^{(2)}\circ s_2^{(3)}\circ s_3^{(4)}\circ s_3^{(5)}(1,2,3,4,5)= 1+(2+((3+4)+5))$$


A semigroup is a set $\mathbb{S}$ together with a binary operation (a function) $\circ:\mathbb{S}\times \mathbb{S} \rightarrow \mathbb{S}$ which satisfisies that $\forall x,y,z \in \mathbb{S}$ $$ \circ(\circ(x, y), z) = \circ(x, \circ(y, z)) $$ For such binary operations we often use infix rather than prefix notation. That is, $\circ(x,y)$ is instead written as $(x\circ y)$ or $x\circ y$. We will follow that convention here, so the condition above would appear as $$ ((x\circ y)\circ z) = (x\circ (y\circ z)) $$

We follow the very nice suggestion from @LeonardBlackburn.

We recursively define the set of $n$-operations over the (ordered) list of $n$ operands $(a_0,\ldots, a_{n-1})\in \mathbb{S}^n$ as follows. $$ \circ _n:\mathbb{S}^n\rightarrow \mathcal{P}(\mathbb{S}) $$

  • If $n=1$ then, for $a\in\mathbb{S}$ we have $\circ _1(a) = \{a\}$
  • If $n>1$ then $\circ _n((a_0,\ldots, a_{n-1})) = \bigcup_{i=1}^{n-1} \{x\circ y:x\in\circ _i(a_0, \ldots, a_{i-1}) \text{ and } y\in \circ _{n-i}(a_i, \ldots, a_{n-1})\}$

As an example we write $\circ _4(a, b, c, d)$.

\begin{align} \circ _4(a, b, c, d) = \{&(a \circ (b \circ (c \circ d))),\\ &(a \circ ((b\circ c) \circ d)),\\ &((a\circ b)\circ (c\circ d)),\\ &((a\circ (b\circ c))\circ d),\\ &(((a\circ b)\circ c)\circ d)\} \end{align}

The idea is that $\circ _n((a_0,\ldots, a_{n-1}))$ is a set containing all "parenthesizations" of the ordered operations over the elements $(a_0,\ldots, a_{n-1})$.

The generalized associative property can be stated as, for $(a_0, \ldots, a_{n-1})\in \mathbb{S}^n$ the set $\circ_n((a_0, \ldots, a_{n-1}))$ contains a single unique element. In other words, if $x\in \circ_n((a_0, \ldots, a_{n-1}))$ and $y\in \circ _n((a_0, \ldots, a_{n-1}))$ then $x=y$.

We now prove this by induction.

Base Case

Consider $n=1$. Consider $a\in \mathbb{S}$. We see that $\circ _1(a) = \{a\}$. Clearly this set has a single unique element.

Induction Hypothesis

For the induction hypothesis we suppose that, for any $m<n$ that if $(a_0, \ldots, a_{m-1}) \in \mathbb{S}^m$ then the set $\circ _m((a_0,\ldots, a_{m-1}))$ contains a single unique element. We denote this element by $a_0 \circ \ldots \circ a_{m-1}$ or $(a_0\circ \ldots\circ a_{m-1})$. Note, of course, that $(a_0\circ\ldots\circ a_{m-1}) \in \mathbb{S}$.

Lemma using the induction hypothesis

Note that, by the induction hypothesis, for $l < m$ we have that $(a_0 \circ \ldots \circ a_{l-1})$ is the unique element in $\circ _l((a_0, \ldots, a_{l-1}))$ and $(a_l\circ \ldots\circ a_{m-1})$ is the unique element in $\circ _{m-l}((a_l, \ldots, a_{m-1}))$. This means that $$ (a_0 \circ \ldots \circ a_{l-1}) \circ (a_l \circ \ldots \circ a_{m-1}) \in \circ _m((a_0, \ldots, a_{m-1})) $$ But $a_0 \circ \ldots \circ a_{m-1}$ was the unique element in this set. This means that $$ a_0 \circ \ldots \circ a_{m-1} = (a_0 \circ \ldots \circ a_{l-1}) \circ (a_l \circ \ldots \circ a_{m-1}) $$ Note that if $l=1$ the expression $(a_0\circ\ldots\circ a_{l-1})$ is interpreted as $a_0$ and, likewise, if $l=m-1$ the expression $(a_l\circ\ldots\circ a_{m-1})$ is interpreted as $a_{m-1}$.

Induction Step

We consider $\circ _n((a_0,\ldots, a_{n-1}))$. $$ \circ _n((a_0,\ldots, a_{n-1})) = \bigcup_{i=1}^{n-1} \{x\circ y:x\in \circ _i(a_0, \ldots, a_{i-1}) \text{ and } y\in \circ _{n-i}(a_i, \ldots, a_{n-1})\} $$ By the induction hypothesis since $i<n$ we have that $\circ _i((a_0, \ldots, a_{i-1}))$ contains a single unique element expressed as $(a_0\circ \ldots \circ a_{i-1})$ and since $n-i<n$ we have that $\circ _{n-i}((a_i, \ldots, a_{n-1}))$ contains a single unique element expressed as $(a_i\circ \ldots \circ a_{n-1})$.

We can then rewrite $$ \circ _n((a_0,\ldots, a_{n-1})) = \bigcup_{i=1}^{n-1} \{(a_0\circ \ldots\circ a_{i-1}) \circ (a_i \circ \ldots \circ a_{n-1})\} $$

Now consider $x, y \in \circ _n((a_0, \ldots, a_{n-1}))$. We will show that $x=y$. We have, for some $i$ and $j$ with $1\le i \le n-1$ and $1 \le j \le n-1$ that \begin{align} x =& (a_0 \circ \ldots \circ a_{i-1}) \circ (a_i \circ \ldots \circ a_{n-1})\\ y =& (a_0 \circ \ldots \circ a_{j-1}) \circ (a_j \circ \ldots \circ a_{n-1}) \end{align} If $i=j$ then clearly $x=y$. If not, we suppose, without loss of generality, that $j>i$. We can then write \begin{align} x =& (a_0 \circ \ldots \circ a_{i-1}) \circ (a_i \circ \ldots a_{j-1} \circ a_j \ldots \circ \ldots \circ a_{n-1})\\ y =& (a_0 \circ \ldots \circ a_{i-1} \circ a_i \circ \ldots \circ a_{j-1}) \circ (a_j \circ \ldots \circ a_{n-1}) \end{align} Note that in the special case $i=j-1$ that $a_i \circ \ldots \circ a_{j-1}$ should be interpreted as $a_i =a_{j-1}$ We then have, by the Lemma on the induction hypothesis, that \begin{align} x =& (a_0 \circ \ldots \circ a_{i-1}) \circ ((a_i \circ \ldots \circ a_{j-1}) \circ (a_j \circ \ldots \circ a_{n-1}))\\ y =& ((a_0 \circ \ldots \circ a_{i-1}) \circ (a_i \circ \ldots \circ a_{j-1})) \circ (a_j \circ \ldots \circ a_{n-1}) \end{align} Let \begin{align} \alpha =& (a_0\circ \ldots \circ a_{i-1})\\ \beta =& (a_i \circ \ldots \circ a_{j-1})\\ \gamma =& (a_j \circ \ldots \circ a_{n-1}) \end{align} So that we see \begin{align} x =& \alpha \circ (\beta \circ \gamma)\\ y =& (\alpha \circ \beta) \circ \gamma \end{align} We then see, by usual associativity on $\circ$, that $x=y$ as needed.

This concludes the proof meaning that $\circ_n((a_0, \ldots, a_{n-1}))$ has a single unique element which we denote as $a_0 \circ \ldots \circ a_{n-1}$ or $(a_0 \circ \ldots \circ a_{n-1})$. Thus associativity of a binary operation implies general associativity of the binary operation. We may also use notation like $$ \bigcirc_{i=0}^{n-1} a_i = (a_0 \circ \ldots \circ a_{n-1}) = a_0 \circ \ldots \circ a_{n-1} $$

I'll emphasize that the most challenging part of this proof was coming up with a satisfactory formalization of the concept of "all parenthesization of an operation over a set of operands".

Finally, this proof took a symmetric, abstract approach in which any $x, y\in \circ_n((a_0, \ldots, a_{n-1}))$ are directly to be shown to be equal to each other. This approach has the advantage that no particular parenthesization gets a privileged position. However, an alternative approach, which has the advantage of being slightly more concrete, but the disadvantage of arbitrarily privileging a particular parenthesization, would be to show that there is a particular $z\in \circ_n((a_0, \ldots, a_{n-1}))$ such that for any $x,y \in \circ_n((a_0, \ldots, a_{n-1}))$ that $x=z$ and $y=z$. For example, we might choose $z$ to be the so-called left-parenthesization: $$ z = (\ldots((a_0 \circ a_1) \circ \ldots) \circ a_{n-1}) $$

  • 976
  • 5
  • 21