I'm studying support vector machines and in the process I've bumped into lagrange multipliers with multiple constraints and Karush–Kuhn–Tucker conditions.

I've been trying to study the subject, but still can't get a good enough grasp on the subject. In wikipedia:


it says that in order to find the extremum points of a function $f$, (with constraints $g_1, ..., g_m$), we must find a point $\text{x}$ such that

$$\sum_{i=1}^{m}\lambda_{i}\nabla g_i(\text{x}) = \nabla f(\text{x})$$

I understand lagrange multipliers when there is only one constraint, but this is hard to grasp for some reason... :(

Could anyone give me easy-to-understand explanation, why the equation above is true?

Thank you for any guidance :)


If it is not a big job to do, I'd be very grateful If someone could also explain the Karush–Kuhn–Tucker conditions which generalize my question :) That would be super!

  • 8,135
  • 12
  • 49
  • 89

1 Answers1


Consider a point $p$ in the common domain $\Omega\subset{\mathbb R}^n$ of $f$ and the constraints $$g_k(x)=0\qquad(1\leq k\leq r)\ .\tag{1}$$ The gradients $\nabla g_k(p)$ define a subspace $U$ of allowed directions when walking away from $p$. In fact a direction $X$ is allowed only if it belongs to the tangent planes of all level surfaces $(1)$. This means that $X$ is perpendicular to all $\nabla g_k(p)$, or is a solution of the homogeneous system of equations $$\nabla g_k(p)\cdot X=0\qquad(1\leq k\leq r)\ .\tag{2}$$

Now comes an important technical condition for the application of Lagrange's method: We have to assume that the $r$ gradients $\nabla g_k(p)$ are linearly independent, i.e. that $p$ is a regular point of the manifold defined by $(1)$. In this case the $\nabla g_k(p)$ span an $r$-dimensional subspace $V$, and the system $(2)$ has full rank. It follows that ${\rm dim}(U)= n-r$. Therefore we not only have $U\subset V^\perp$, but in fact $$U=V^\perp\ .$$

When $\nabla f(p)\cdot X\ne0$ for some allowed direction $X$ then the function $f$ is not conditionally stationary at $p$. For a constrained local extremum of $f$ at $p$ we therefore need $$\nabla f(p)\cdot X=0$$ for all directions $X\in U$, in other words: It is necessary that $$\nabla f(p)\in U^\perp=V\ .\tag{3}$$ When $\nabla f(p)\in V=\langle\nabla g_1(p),\ldots,\nabla g_r(p)\rangle$ then there are numbers $\lambda_k$ $\>(1\leq k\leq r)$ such that $$\nabla f(p)=\sum_{k=1}^r \lambda_k\>\nabla g_k(p)\ .\tag{4}$$ Solving $(4)$ (with $x$ in place of $p$) together with $(1)$ will bring all regular constrained extrema of $f$ to the fore.

Christian Blatter
  • 216,873
  • 13
  • 166
  • 425
  • Extremely helpful, thank you -- is the requirement that the gradients be linearly independent necessary, so that the multipliers are unique? Are there other reasons? – one_observation Aug 01 '16 at 19:14
  • @Sophologist: Here is a counterexample that shows that the requirement is necessary: http://math.stackexchange.com/questions/147338/lagrange-multiplier-question-finding-a-counterexample – Christian Blatter Aug 01 '16 at 19:21