26

I've always used the method of Lagrange multipliers with blind confidence that it will give the correct results when optimizing problems with constraints. But I would like to know if anyone can provide or recommend a derivation of the method at physics undergraduate level that can highlight its limitations, if any.

Rodrigo de Azevedo
  • 18,977
  • 5
  • 36
  • 95
John McVirgooo
  • 393
  • 3
  • 12

2 Answers2

40

Lagrange multipliers are used to obtain the maximum of a function $f(\mathbf{x})$ on a surface $\{ \mathbf{x}\in\mathbb{R}^n\mid g(\mathbf{x}) = 0\}$ (I use "surface", but whether it is a 2-dimensional, 1-dimensional, or whatever-dimensional object will depend on the $g$ and the $\mathbb{R}^n$ we are dealing with).

The gradient of $f$, $\nabla f$, points in the direction of greatest increase for $f$. If we want to find the largest value of $f$ along $g$, then we need the direction of greatest increase to be orthogonal to $g$; otherwise, moving along $g$ will "capture" some of that increase and $f$ will not achieve its maximum among $g$ at that point (this is akin to the fact that in one-variable calculus, the derivative should be $0$ at the maximum, otherwise moving a bit will increase in one direction will increase the value of the function).

In order for $\nabla f$ to be perpendicular to the surface, it must be parallel to the gradient of $g$; so $\nabla f$ must be a scalar multiple of $\nabla g$. So this amounts to finding a solution to the system \begin{align*} \nabla f(\mathbf{x}) &= \lambda \nabla g(\mathbf{x})\\ g(\mathbf{x}) &= 0 \end{align*} for both $\mathbf{x}$ and $\lambda$.

Added. Such a point is not guaranteed to be a maximum or a minimum; it could also be a saddle point, or nothing at all, much as in the one-variable case, points where $f'(x)=0$ are not guaranteed to be extremes of the function. Another obvious limitation is that if the surface $g$ is not differentiable (does not have a well-defined gradient) then you cannot even set up the system.

Arturo Magidin
  • 356,881
  • 50
  • 750
  • 1,081
  • 2
    +1, although I would add, as a nod to the OP's request to "highlight its limitations," that not every solution to the system is guaranteed to be a maximum or minimum of $f$ on $g(\mathbf{x}) = 0$ (as with the single-variable case with the derivative being zero). – Mike Spivey Feb 27 '11 at 05:17
  • @Mike Good point. – Arturo Magidin Feb 27 '11 at 05:19
  • 1
    Very easy to understand, thanks. – John McVirgooo Feb 27 '11 at 05:20
  • An answer by @ArturoMagidin! I think I will upvote without reading it. And then I'll read it. – badatmath Nov 15 '12 at 00:33
  • Can you please elaborate on what you mean by "The gradient of $f$, $\nabla f$, points in the direction of greatest increase for $f$."? Do you mean that $\nabla f$ points in the direction of the greatest increase for $f$ at the point where $f$ is maximal? – M Smith Dec 02 '15 at 16:29
  • Nice answer, but I think you have an extra copy of the words "will increase" – J. W. Tanner Feb 03 '20 at 01:02
  • 1
    @J.W.Tanner: Yes, but I'm not going to bump an 11-year-old question to correct a bit of grammar that does not obscure the meaning nor misleads the reader. – Arturo Magidin Feb 03 '20 at 01:31
17

An algebraic way of looking at this is as follows:

From an algebraic view point, we know how to find the extremum of a function of many variables. Say we want to find the extremum of $f(x_1,x_2,\ldots,x_n)$, we set the gradient to zero and look at the definiteness of the Hessian.

We would like to extend this idea, when we want to find the extremum of a function along with some constraints. Say the problem is: $$\begin{align} \text{Minimize }f(x_1,x_2,\ldots,x_n)\\\ \text{subject to: }g_k(x_1,x_2,\ldots,x_n) = 0\\\ \text{where }k \in \{1,2,\ldots,m\}\\\ \end{align} $$

If we find the extremum of $f$ just by setting the gradient of $f$ to zero, these extremum need not satisfy the constraints.

Hence, we would like to include the constraints in the previous idea. One way to it is as follows. Define a new function: $$F(\vec{x},\vec{\lambda}) = f(\vec{x}) - \lambda_1 g_1(\vec{x}) - \lambda_2 g_2(\vec{x}) - \cdots - \lambda_m g_m(\vec{x})$$ where $\vec{x} = \left[ x_1,x_2,\ldots,x_n \right], \vec{\lambda} = \left[\lambda_1,\lambda_2,\ldots,\lambda_m \right]$

Note that when the constraints are enforced, we have $F(\vec{x},\vec{\lambda}) = f(\vec{x})$ since $g_j(x) = 0$ when the constraints are enforced.

Let us find the extremum of $F(\vec{x},\vec{\lambda})$. This is done by setting $\frac{\partial F}{\partial x_i} = 0$ and $\frac{\partial F}{\partial \lambda_j} = 0$ where $i \in \{1,2,\ldots,n\}$ and $j \in \{1,2,\ldots,m\}$

Setting $\frac{\partial F}{\partial x_i} = 0$ gives us $$\vec{\nabla}f = \vec{\nabla}g \cdot \vec{\lambda}$$ where $\vec{\nabla}g = \left[\vec{\nabla} g_1(\vec{x}),\vec{\nabla} g_2(\vec{x}),\ldots,\vec{\nabla} g_m(\vec{x}) \right]$

Setting $\frac{\partial F}{\partial \lambda_j} = 0$ gives us $$g_j(x) = 0$$ where $j \in \{1,2,\ldots,m\}$

Hence, we find that when we find the extremum of $F$, the constraints are automatically enforced. This means that the extremum of $F$ corresponds to extremum of $f$ with the constraints enforced.

To decide, if the extremum is a minimum (or) maximum (or) if the point we obtain by solving the system is a saddle point, we need to look at the definiteness of the Hessian of $F$ and decide.

  • +1. The approach Sivaram describes here also leads to a notion of duality for nonlinear optimization problems and ultimately to the important Karush-Kuhn-Tucker conditions. – Mike Spivey Feb 27 '11 at 05:49