I've always used the method of Lagrange multipliers with blind confidence that it will give the correct results when optimizing problems with constraints. But I would like to know if anyone can provide or recommend a derivation of the method at physics undergraduate level that can highlight its limitations, if any.
 18,977
 5
 36
 95
 393
 3
 12

@John, you may or may not find the answers to this similar question helpful: http://math.stackexchange.com/q/674/400 – Vladimir Sotirov Feb 27 '11 at 06:03
2 Answers
Lagrange multipliers are used to obtain the maximum of a function $f(\mathbf{x})$ on a surface $\{ \mathbf{x}\in\mathbb{R}^n\mid g(\mathbf{x}) = 0\}$ (I use "surface", but whether it is a 2dimensional, 1dimensional, or whateverdimensional object will depend on the $g$ and the $\mathbb{R}^n$ we are dealing with).
The gradient of $f$, $\nabla f$, points in the direction of greatest increase for $f$. If we want to find the largest value of $f$ along $g$, then we need the direction of greatest increase to be orthogonal to $g$; otherwise, moving along $g$ will "capture" some of that increase and $f$ will not achieve its maximum among $g$ at that point (this is akin to the fact that in onevariable calculus, the derivative should be $0$ at the maximum, otherwise moving a bit will increase in one direction will increase the value of the function).
In order for $\nabla f$ to be perpendicular to the surface, it must be parallel to the gradient of $g$; so $\nabla f$ must be a scalar multiple of $\nabla g$. So this amounts to finding a solution to the system \begin{align*} \nabla f(\mathbf{x}) &= \lambda \nabla g(\mathbf{x})\\ g(\mathbf{x}) &= 0 \end{align*} for both $\mathbf{x}$ and $\lambda$.
Added. Such a point is not guaranteed to be a maximum or a minimum; it could also be a saddle point, or nothing at all, much as in the onevariable case, points where $f'(x)=0$ are not guaranteed to be extremes of the function. Another obvious limitation is that if the surface $g$ is not differentiable (does not have a welldefined gradient) then you cannot even set up the system.
 356,881
 50
 750
 1,081

2+1, although I would add, as a nod to the OP's request to "highlight its limitations," that not every solution to the system is guaranteed to be a maximum or minimum of $f$ on $g(\mathbf{x}) = 0$ (as with the singlevariable case with the derivative being zero). – Mike Spivey Feb 27 '11 at 05:17


1

An answer by @ArturoMagidin! I think I will upvote without reading it. And then I'll read it. – badatmath Nov 15 '12 at 00:33

Can you please elaborate on what you mean by "The gradient of $f$, $\nabla f$, points in the direction of greatest increase for $f$."? Do you mean that $\nabla f$ points in the direction of the greatest increase for $f$ at the point where $f$ is maximal? – M Smith Dec 02 '15 at 16:29

Nice answer, but I think you have an extra copy of the words "will increase" – J. W. Tanner Feb 03 '20 at 01:02

1@J.W.Tanner: Yes, but I'm not going to bump an 11yearold question to correct a bit of grammar that does not obscure the meaning nor misleads the reader. – Arturo Magidin Feb 03 '20 at 01:31
An algebraic way of looking at this is as follows:
From an algebraic view point, we know how to find the extremum of a function of many variables. Say we want to find the extremum of $f(x_1,x_2,\ldots,x_n)$, we set the gradient to zero and look at the definiteness of the Hessian.
We would like to extend this idea, when we want to find the extremum of a function along with some constraints. Say the problem is: $$\begin{align} \text{Minimize }f(x_1,x_2,\ldots,x_n)\\\ \text{subject to: }g_k(x_1,x_2,\ldots,x_n) = 0\\\ \text{where }k \in \{1,2,\ldots,m\}\\\ \end{align} $$
If we find the extremum of $f$ just by setting the gradient of $f$ to zero, these extremum need not satisfy the constraints.
Hence, we would like to include the constraints in the previous idea. One way to it is as follows. Define a new function: $$F(\vec{x},\vec{\lambda}) = f(\vec{x})  \lambda_1 g_1(\vec{x})  \lambda_2 g_2(\vec{x})  \cdots  \lambda_m g_m(\vec{x})$$ where $\vec{x} = \left[ x_1,x_2,\ldots,x_n \right], \vec{\lambda} = \left[\lambda_1,\lambda_2,\ldots,\lambda_m \right]$
Note that when the constraints are enforced, we have $F(\vec{x},\vec{\lambda}) = f(\vec{x})$ since $g_j(x) = 0$ when the constraints are enforced.
Let us find the extremum of $F(\vec{x},\vec{\lambda})$. This is done by setting $\frac{\partial F}{\partial x_i} = 0$ and $\frac{\partial F}{\partial \lambda_j} = 0$ where $i \in \{1,2,\ldots,n\}$ and $j \in \{1,2,\ldots,m\}$
Setting $\frac{\partial F}{\partial x_i} = 0$ gives us $$\vec{\nabla}f = \vec{\nabla}g \cdot \vec{\lambda}$$ where $\vec{\nabla}g = \left[\vec{\nabla} g_1(\vec{x}),\vec{\nabla} g_2(\vec{x}),\ldots,\vec{\nabla} g_m(\vec{x}) \right]$
Setting $\frac{\partial F}{\partial \lambda_j} = 0$ gives us $$g_j(x) = 0$$ where $j \in \{1,2,\ldots,m\}$
Hence, we find that when we find the extremum of $F$, the constraints are automatically enforced. This means that the extremum of $F$ corresponds to extremum of $f$ with the constraints enforced.
To decide, if the extremum is a minimum (or) maximum (or) if the point we obtain by solving the system is a saddle point, we need to look at the definiteness of the Hessian of $F$ and decide.

+1. The approach Sivaram describes here also leads to a notion of duality for nonlinear optimization problems and ultimately to the important KarushKuhnTucker conditions. – Mike Spivey Feb 27 '11 at 05:49