I've already asked about the definition of tensor product here and now I understand the steps of the construction. I'm just in doubt about the motivation to construct it in that way. Well, if all that we want is to have tuples of vectors that behave linearly on addition and multiplication by scalar, couldn't we just take all vector spaces $L_1, L_2,\dots,L_p$, form their cartesian product $L_1\times L_2\times \cdots \times L_p$ and simply introduce operations analogous to that of $\mathbb{R}^n$ ?

We would get a space of tuples of vectors on wich all those linear properties are obeyed. What's the reason/motivation to define the tensor product using the free vector space and that quotient to impose linearity ? Can someone point me out the motivation for that definition ?

Thanks very much in advance.

  • 24,123
  • 13
  • 78
  • 179
  • 11
    If you wanted linearity alone, use the direct sum. But if you want *multilinearity* (which means linearity in each component while all other components are *fixed*) you definitely can't use the direct sum: addition is not multilinear. Multiplication is multilinear, and that's what the tensor product is designed to mimic. It turns multilinear maps (make sure you know what that term means) into linear maps. – KCd Mar 16 '13 at 15:18
  • 1
    There are reopen votes -- why? This looks like a bona fide duplicate to me. – hmakholm left over Monica Dec 18 '13 at 15:03

5 Answers5


The "product" of vector spaces is more properly thought of as a direct sum, since the dimension increases linearly according to addition, $\dim(V\oplus W)=\dim(V)+\dim(W)$. Tensor products are much bigger in size than sums, since we have $\dim(V\otimes W)=\dim(V)\times\dim(W)$. In fact, in analogy to elementary arithmetic, we have distributivity $(A\oplus B)\otimes C\cong (A\otimes C)\oplus(B\times C)$.

Note (as KCd says in the comments), linearity in $V\oplus W$ and $V\otimes W$ are very different:

$$\begin{array}{cl}V\oplus W: & (a+b,c+d) & = (a,0)+(b,0)+(0,c)+(0,d) \\ V\otimes W: & (a+b)\otimes(c+d) & = a\otimes c+a\otimes d+b\otimes c+b\otimes d. \end{array}$$

Another item you might forget is that the tensor product $V\otimes W$ is not simply comprised of the so-called "pure tensors" of the form $v\otimes w$; it also has linear combinations of these elements. While pure tensors may be decomposed into sums of pure tensors (using linearity on both sides of the symbol $\otimes$), not every tensor is amenable to being put into the form of a pure tensor.

Here's one way to formally think about the difference, in a quantum-mechanical spirit. Given a distinguished basis of $V$ and $W$, say $X={\cal B}_V$ and $Y={\cal B}_W$, we may say that $V$ and $W$ are the free vector spaces generated from $X$ and $Y$, i.e. that they each are formal $K$-linear ($K$ being the base field) combinations of elements of $X$ and $Y$ respectively, i.e. $V=KX$ and $W=KY$.

Then, assuming $X,Y$ are disjoint, we may say $V\oplus W\cong K(X\cup Y)$, i.e. we allow the two bases together to form a new basis. But $V\otimes W\cong K(X\times Y)$, and $X\times Y$ is certainly different from the union $X\cup Y$. If we think about $X$ and $Y$ as being sets of "pure states" of some theoretical system, then the direct sum says we think about $X$ and $Y$ as disjoint collections of pure states of a single system, and view the vector spaces as superpositions of pure states, in which case the direct sum is just opening ourselves up to both collections of pure states when we make our superpositions.

But the tensor product has as basis the collection of pure states of the composite system of the two systems underlying $X$ and $Y$. That is, we view them as distinct systems that make up a larger system, so that the state of system 1 may vary independentently of the state of system 2, in which case the collection of pure states for the composite system is $X\times Y$.

The tensor product is a way to encode multilinearity, though the binary operation $\otimes$ by itself only encodes bilinearity. That is, the space of bilinear maps into the ground field $K$, the first argument taking vectors from $U$ and the second taking vectors from $V$, is the tensor product $U^*\otimes V^*$. The dual spaces $U^*$ and $V^*$ (viewing everything as finite-dimensional), have bases that come from bases $\{u_i\}$ and $\{v_i\}$ on $U$ and $V$ respectively.

Specifically, $U^*$ has basis $\{u_i^*\}$, where $u_i^*(u_j)=\delta_{ij}$ is the scalar part of the projection onto the one-dimensional subspace generated by $u_i$, and similarly for $v_i^*$. For the linear vector space of bilinear maps $U,V\to K$, it suffices to check where the basis pairs $u_i,v_j$ is sent, so we can define maps $(u_i\otimes v_j)(u_k,v_\ell)=\delta_{ik}\delta_{j\ell}$ in the exact same spirit, and these bilinear maps $u_i\otimes v_j$ will form a basis of the space of all bilinear maps. (Note that $u_i\otimes v_j$ says "apply $u_i^*$ to the first argument and $v_j^*$ to the second, and multiply the two resulting scalars.") This is the ground covered by muzzlator.

This allows us to reinterpret linear maps between vector spaces in a number of new ways. In particular, the linear maps $U\to V$ may be reinterpreted as linear maps $U\otimes V^*\to K$, or $V^*\to U^*$, or $K\to U^*\otimes V^*$. We also have the tensor-hom adjunction $$\hom(U\otimes V,W)\cong\hom(U,\hom(V,W)),$$ where $\hom(A,B)$ is the space of linear maps $A\to B$. This is the "category of ($K$-)vector spaces" version of the set-theoretic concept of "currying," where a map $A\times B\to C$ can be reinterpreted as a map from $A$ into the set of maps from $B\to C$ (here $A,B,C$ are sets and maps are not in any special algebraic sense homomorphisms, they are just maps).

Tensor products are the formal machinery behind the concept of "extension of scalars." For instance, given a real vector space $V$, how could we make it a complex vector space? We aren't a priori allowed to multiply vectors by nonreal scalars, but if we pretend we can (just look at the space $V\oplus iV$ with the obvious notion of complex scalar multiplication) we have a complex vector space. This process is called complexification, and it can be done simply by tensoring $V$ over $\bf R$ against $\bf C$, i.e. the complexification may be given by $V_{\bf C}\cong{\bf C}\otimes_{\bf R}V$. This allows us to left multiply by complex scalars in a consistent manner.

Going from real to complex vector spaces is not all it is limited to, though. If $V$ is a $K$-vector space and $L/K$ is an extension field of $K$, we can make $V$ an $L$-vector space via $L\otimes_KV$. Given that $L$ is itself a $K$-vector space, we could make a $K$-basis $\{\ell_i\}$ for it (could be infinite, even uncountable), and extend the scalars via $\bigoplus_i \ell_i V$ formally, but tensoring is succinct and coordinate-free. The very same ideas apply to modules, which are more general than vector spaces.

When we allow our vector space to have a multiplication operation compatible with the linear structure (so, a $K$-algebra), we can extend the multiplication to the tensor product. This allows us to "glue" algebras together (more than just tacking on extra scalars). Or rings in general, actually.

In particular, for $R$ a ring, $R[x]\otimes_R R[y]\cong R[x,y]$ as polynomial rings. The multiplication operation is extended from $(a\otimes b)(c\otimes d)=ac\otimes bd$ via linearity (which can be seen to be well-defined).

Finally, KCd mentions in passing induction and restriction of representations. As a representation $V$ of $G$ over $K$ may be viewed as a $K[G]$-module, induction can be seen as ${\rm Ind}_H^GV\cong K[G]\otimes_{K[H]}V$, although probably the more natural definitions are "induction is the left-adjoint (and coinduction is the right-adjoint) of restriction," (see adjoint functor) which is the categorical version of the statement of Frobenius reciprocity.

By quotienting a tensor power $V^{\otimes n}:=V\otimes V\otimes\cdots\otimes V$ by certain relations, we can obtain the exterior power $\Lambda^nV$ (there is also a symmetric power), which is spanned by alternating multilinear symbols of the form $v_1\wedge v_2\wedge\cdots\wedge v_n$. This allows for a new definition of the determinant map and hence of characteristic polynomials too, and it also allows the creation of the exterior algebra of differential forms, a very intrinsic way of working with geometric, multidimensional infinitessimals (informally speaking). Another application of tensor powers: by directly summing tensor powers of Lie algebra representations, we obtain the universal enveloping algebra $U({\frak g})$.

  • 2,245
  • 1
  • 16
  • 38
  • 80,883
  • 8
  • 148
  • 244
  1. Tensor products turn multilinear algebra into linear algebra. That's the point (or at least one point).

  2. They let you treat different kinds of base extension (e.g., viewing a real matrix as a complex matrix, making a polynomial in ${\mathbf Z}[X]$ into a polynomial in $({\mathbf Z}/m{\mathbf Z})[X]$, turning a representation of a subgroup $H$ into a representation of the whole group $G$) as special instances of one general construction.

  3. They provide a mathematical explanation for the phenomenon of "entangled" states in quantum mechanics (a tensor that is not an elementary tensor).

See Why is the tensor product important when we already have direct and semidirect products? for more answers to your question (it's a duplicate question).

  • 30,396
  • 2
  • 67
  • 110

When I studied tensor product, I am lucky to find this wonderful article by Tom Coates. Starting with the very trivial functions on the product space, he explains the intuition behind tensor products very clearly.

Martin Sleziak
  • 50,316
  • 18
  • 169
  • 342
Hui Yu
  • 14,131
  • 4
  • 33
  • 97
  • The link seems to be dead now, but some version seems to be save in Internet Archive: http://web.archive.org/web/%2A/http://www.math.harvard.edu/~tomc/math25/tensor.pdf http://web.archive.org/web/20110826113738/http://www.math.harvard.edu/~tomc/math25/tensor.pdf and it can probably also be found in other places: https://www.google.com/search?q=%22the+tensor+product+of+vector+spaces%22+%22tom+coates%22 – Martin Sleziak May 19 '15 at 08:26

The best example to look at why the tensor product is defined as it is is by considering bilinear functions $B : V \times V \rightarrow \mathbb{R}$

Suppose we wish to represent $B$ by a single linear map of vector spaces $\hat{B} : X \rightarrow \mathbb{R}$. What sort of basis would our space $X$ need to have in order to define $\hat{B}$ from $B$?

Let $\{e_i\}$ be a basis for $V$, then each term $B(e_i, e_j)$ can be defined independently. The reason we can't simply pick $X$ to be $V \times V$ is that if we define $\hat{B}(x) = B(x_1, x_2)$, then $$\hat{B}(e_i, e_j) = \hat{B}(e_i, 0) + \hat{B}(0, e_j) = B(e_i, 0) + B(0, e_j) = 0 + 0 = 0$$

where $x = (x_1, x_2) \in V \times V$.

Instead we need $X$ to be generated by pairs of basis vectors themselves, $X = \langle e_i \otimes e_j\rangle $ and then we can think of $\hat{B}$ as a linear operator on this space.

  • 7,215
  • 1
  • 17
  • 38

The tensor product IS an analogue of the Cartesian product, but it is AN analogue of the Cartesian product. What I mean by this is the following.

When working with sets, the Cartesian product happens to satisfy TWO properties.

  1. The first property is that if you have two functions $f: A\to X$ and $g\colon A\to Y$, you can "pair them up" to get a unique function $(f,g)=h\colon A\to X\times Y$, and conversely any function $h\colon A\to X\times Y$ there are unique functions $f\colon A\to X$ and $g\colon A\to Y$ so that $h=(f,g)$.

  2. The second property is that any $X$-parametrized family of functions $\{f_x\colon Y\to B\}_{x\in X}$ corresponds to a function $f\colon X\times Y\to B$ under the identification $f(x,y)=f_x(y)$.

When working with vector spaces (or $R$-modules), the direct product satisfies the first property, while the tensor product satisfies the second property. In fact, the first property is the definition of product, the second property is the definition of tensor product, and is a theorem that these definitions can be satisfied by sets and by vector spaces, and in fact by the same sets, but by different vector spaces.

The proof of the theorem for sets is just checking that the Cartesian product (which we have constructed) satisfies the properties. To prove the theorem for vector spaces, we need to construct both the direct product and the tensor product. (Of course, the two properties for vector spaces will talk about linear functions and linear parametrizations rather than just functions and parametrizations.)

In order to perform these constructions, we have to take note of certain (easy to check, but conceptually new) relationships between vector spaces and sets. The keyword here is adjoint functors, though I will not use it, merely illustrate it.

Properties of free vector spaces

First, for any vector space $V$ we have the underlying set $\mathcal SV$ consisting of the vectors in $V$. Additionally, for any linear map $\phi\colon V\to W$ we have the underlying set-map $\mathcal S\phi\colon\mathcal SV\to\mathcal SW$ given by $(\mathcal S\phi)(v)=w$.

Second, for any set $S$ we have the free vector space $\mathcal FV$ consisting of finite linear combinations of elements of $S$ (this is the vector space with basis $S$). Free vector spaces satisfy the following property:

  • Any function $f\colon S\to \mathcal SV$ corresponds to a unique linear function $\phi\colon\mathcal FS\to V$, and conversely. In other words, linear functions are uniquely determined by their values on bases.

In particular, note that the identity function $\DeclareMathOperator{\id}{id}\id\colon\mathcal SV\to\mathcal SV$ has to correspond to a unique linear map $\xi\colon \mathcal F\mathcal SV\to V$. This map is significant because its kernel consists precisely of the linearity relations of vectors on $V$. It follows that $f\colon\mathcal SV\to\mathcal SW$ is the underlying set-map of a linear map $\phi\colon V\to W$ if and only if the induced map $\psi\colon \mathcal F\mathcal S V\to V$ vanishes on $\ker\xi$. Furthermore, one can show that when that is the case, $\psi$ factors though $\xi$ as $\psi=\phi\circ\xi$ where $f=\mathcal S\phi$.

Even better, if we have a map $f\colon\mathcal SV\to X$, this gives an induced map $\psi\colon\mathcal F\mathcal S V\to\mathcal FX$, If $\psi$ vanishes on $\ker\xi$, then it factors as $\psi=\phi\circ\xi$ with $\phi\colon V\to FX$, and the image has to have underlying set $X$ in the sense that $\mathcal S(\operatorname{im}(\phi))=X$. Now, the condition that $\psi$ vanishes on $\ker\xi$ depends only on $f$, and corresponds exactly to the case when $f\colon SV\to X$ induces on $X$ the structure of a vector space $X_f$ so that $f=\mathcal S\phi$ for $\phi\colon V\to X_f$.

Direct product

Consider two linear functions $\phi\colon U\to V_1$ and $\psi\colon U\to V_2$, and let $f\colon \mathcal SU\to\mathcal SV_1$ and $g\colon\mathcal SU\to\mathcal SV_2$ be the underlying set-maps $f=\mathcal S\phi$, $g=\mathcal S\psi$. Then we definitely have a set-map $(f,g)\colon\mathcal SU\to\mathcal SV_1\times\mathcal SV_2$. It is not difficult to check that $(f,g)$ induces a vector-space structure on $\mathcal SV_1\times\mathcal SV_2$, though it is slightly more work to show that this is the same vector space structure for any pair of maps $f$ $g$. Nevertheless, this is the gives us the direct product for vector spaces is $V_1\oplus V_2$.

Tensor product Consider a $V_1$-parametrized family of (linear) functions $\{\phi_v\colon V_2\to W\}_{v\in V_1}$. Applying the forgetful functor, we get an $\mathcal SV_1$-parametrized family of maps $\{f_{v_1}\colon\mathcal SV_2\to\mathcal SW\}_{v\in \mathcal SV_1}$. By the second property of the Cartesian product, this gives a map $f\colon \mathcal SV_1\times\mathcal S V_2\to\mathcal SW$. Applying the free functor gives us a function $\phi\colon\mathcal F(\mathcal SV_1\times\mathcal SV_2)\to W$ which is linear. Thus, $V_1$-parametrized families $\{\phi_v\colon V_2\to W\}_{v\in V_1}$ of linear maps embed as linear maps $\phi\colon\mathcal F(\mathcal SV_1\times\mathcal SV_2)\to W$. Taking some care, we can show that the necessary and sufficient condition for $f\colon\mathcal SV_1\times\mathcal SV_2\to\mathcal SW$ to come from a linear parametrization is that the induced $\phi$ vanishes on $\ker\xi_1\oplus\mathcal\ker\xi_2$ where $\xi_i\colon\mathcal F\mathcal S V_i\to V_i$ are the linear maps induced by $\id\colon\mathcal SV_i\to\mathcal SV_i$. Hence taking the quotient with respect to that subspace (which is just the one spanned by the multilinearity relations) gives a new vector space $V_1\otimes V_2$ with the property that $V_1$-linearly parametrized families of maps $\{\phi_v\colon V_2\to W\}_{v\in V_1}$ naturally correspond to linear maps $\bar\phi\colon V_1\otimes V_2\to W$.

Vladimir Sotirov
  • 12,791
  • 1
  • 24
  • 63