I'm working through an optimization problem that reformulates the problem in terms of KKT conditions. Can someone please have a go at explaining the following in simple terms?

  • What do we gain by rewriting an optimization problem in terms of KKT conditions? It seems like we are just writing our original constrained optimization problem as a different constrained optimization problem that isn't any easier to solve.

  • In fact, for constrained optimization problems without inequality constraints, what exactly do we gain by using the method of Lagrange multipliers at all? All we get is a nonlinear system of equations which is in general not easy to solve.

If we require numerical methods to solve the reformulated problem, why not just use numerical methods in the first place?

  • 639
  • 1
  • 6
  • 13
  • You gain the ability to solve the problem... – Mikhail May 05 '13 at 05:04
  • But you don't, in the case of KKT you get another optimization problem which still has inequality constraints with no obvious analytic solution. – Flash May 05 '13 at 05:08
  • The general scheme is KKT -> Newton-Raphson -> Ax=b problem. Where you made A larger due to your constraints. I think this website has a few examples: http://mat.gsia.cmu.edu/classes/QUANT/NOTES/chap4/node6.html. – Mikhail May 05 '13 at 06:02
  • In calculus if you want to minimize $f(x)$, you can often do it by solving the optimality condition $\nabla f(x) = 0$. What if you wanted to minimize $f(x)$ subject to certain constraints? We still can often take the same approach -- write down the optimality condition (in this case KKT conditions) and solve for $x$. – littleO Aug 05 '13 at 02:27

4 Answers4

  1. As any practitioner knows the "pain" in solving a constrained optimization problem is to figure out which constraints bind and which constraints are slack. If we knew beforehand which constraints really matter we can just use Lagrangian methods. KKT gives us a systematic way of checking which constraints are active (bind) and which are not.
  2. By using Lagrangian methods we can valuable information about the problem. Economists refer to the Lagrange multipliers as "shadow prices" because if you have a constraint say $g(x)= c$ and the associated multiplier is $\lambda$ then you know the how the optimal value of your objective function $f$ would change if you were able to change $c$: $$\dfrac{\partial}{\partial c}\left( \max\limits_{x \text{ st } g(x)=c} f(x) \right)=\lambda.$$
  3. Even some numerical methods that you propose to use do employ variations of KKT and Lagrangian ideas. Another natural way of thinking about the multipliers in a KKT problem is to look at them as penalties for violating the constraints. Say we have a constraint $g(x)\le c$ in a maximization problem then the Lagrangian is $f(x)-\lambda\cdot (g(x)-c)$ and since $\lambda\ge 0$ then if a numerical routine would pick $x$ such it violated the constraint that would reduce the value of the Lagrangian, the higher the $\lambda$ the higher the reduction/penalty.
  4. Finally, some problems are easier solved if one looks at its dual problem (duality is better know for linear programming but we can also use it in non-linear programming) instead of the original problem itself. Without referencing/using Lagrange multipliers we can not even formulate what the dual problem is.
Sergio Parreiras
  • 3,663
  • 1
  • 18
  • 40

By using Lagrange multipliers or the KKT conditions, you transform an optimization problem ("minimize some quantity") into a system of equations and inequations -- it is no longer an optimization problem.

The new problem can be easier to solve. It is also easier to check if a point is a solution. But there are also a few drawbacks: for instance, it only gives a necessary condition.

This is the same difference as between "find $\min_x f(x)$" and "solve $f'(x)=0$", where $f:\mathbb{R} \longrightarrow \mathbb{R}$.

  • In many practical cases we have enough assumptions so that KKT are necessary and sufficient. Same if we assume $f$ is strictly convex then $f^\prime$ is a sufficient condition for a minimum. – Sergio Parreiras Nov 08 '13 at 18:44

Starting from your second question: What do we gain when the problem has only equality constraints - and if in any case it is a problem that will be solved by numerical methods, why bother?
There are many cases where the problem under study is posed in abstract terms - no numbers in sight, just symbols for the variables, and yet more symbols for the coefficients/parameters. In such a case, the exact quantitative solution is not something we care about - what we do care is to characterize the solution, i.e. determine its qualitative characteristics (like existence, uniqueness, robustness to perturbations, e.t.c). But even if we do seek the specific quantitative solution, numerical optimization does not characterize the solution - it just gives it to you (and why are we so certain that the algorithm didn't mess up? -especially with ill-behaved objective functions). And knowing only this, leaves a lot of uncertainty. Numerical optimization will give you the coordinates and the height of the peak of the mountain - don't you think that it is important to know how snowy and slippery are the rocks that surround the peak? $$ $$ As for your first question, in the analytic elaboration of the problem usually it is an algebraic nightmare not to formulate in Lagrange, KKT or, (even better) Fritz John terms. Consider $\max_{x,y} f(x,y) \; \text{s.t.}\; g(x)=h(y) $. without using the multiplier approach you should find for example $x=g^{-1}(h(y)$ and then face $max_{y}f\left[(g^{-1}(h(y)),y\right]$ - and I didn't say that $x$ and $y$ are one-dimensional. What experience I have says that this is not an equally easy problem to work with.

Alecos Papadopoulos
  • 9,930
  • 1
  • 24
  • 42

I think the key is you transform the optimization problem into an Equation which dont need some software to solve but can be solved by the derivatives directly.