This is a long question in which I explain my current understanding of certain ideas. If anyone is interested in reading this and would like to provide any commentary/feedback that may help me understand these ideas more clearly, or that you think I might find interesting, I'd appreciate it!

In multivariable calculus, to minimize a function $f:\mathbb R^N \to \mathbb R$ we write down the optimality condition $\nabla f(x) = 0$ and solve for $x$. Multivariable calculus classes usually take this further and give a method for minimizing $f$ subject to an equality constraint $h(x) = 0$. You form the Lagrangian $L(x,z) = f(x) + \langle h(x), z \rangle$, write down the optimality condition

\begin{align} h(x) &= 0 \\ \frac{\partial L(x,z)}{\partial x} &= 0 \end{align} and solve for $x$ (and $z$).

Calculus classes normally stop there, but they could take it further still and show how to handle inequality constraints -- in that case we'll have a nonnegativity constraint and "complementary slackness" in the optimality conditions (KKT conditions).

Under certain assumptions ("constraint qualifications"), these optimality conditions are necessary (but not sufficient) for $x$ to be a local minimizer.

These optimality conditions can be derived by linearizing about a minimizer, or by drawing some pictures and making some geometric arguments. (Nocedal and Wright presents this material well.)

So far, we have not mentioned convexity at all, nor have we mentioned the dual problem. So how do those ideas fit into the picture?

A calculus class could mention that if our optimization problem is convex, then the necessary conditions in fact are sufficient for $x$ to be a global minimizer.

But there is more to be said about the convex case. When working with convex functions, it's unnatural to require them to be differentiable. We should allow for non-differentiable convex functions and use subgradients rather than gradients. Indeed a version of the KKT conditions for convex optimization problems can be given which does not assume differentiability.

And how do we prove that for a convex optimization problem, if an appropriate "constraint qualification" is satisfied, then the subgradient version of the KKT conditions is necessary and sufficient for $x$ to be a minimizer?

There are different ways to prove this. (QUESTION: Can anyone summarize the various approaches that could be taken to prove this?) One way, perhaps the best way, is to prove that: 1) if Slater's condition is satisfied, then strong duality holds and there is a dual optimal variable; 2) if strong duality holds, then the KKT conditions are satisfied by $x,z$ if and only if $x$ is primal optimal and $z$ is dual optimal. (This is the approach taken in Boyd and Vandenberghe.) For the first time in this development we have now mentioned the dual problem. I think other approaches to proving a KKT theorem in convex optimization don't use the dual problem at all.

And where did this dual problem come from? How do we motivate that? What's the best way to think about it?

One of the key ideas of convex analysis is that a closed convex set $C$ is the intersection of all closed half-spaces that contain $C$. I think this could be called the source of all duality results in convex analysis. When this idea is applied to a convex cone $K$, we discover the dual cone $K^*$ and the fact that $K^{**} = K$. When this idea is applied to the epigraph of a closed convex function $f$, we discover the convex conjugate $f^*$ and the fact that $f^{**} = f$. It's tempting to apply this idea (somehow) to a convex optimization problem. How do we associate a closed convex set $C$ to a convex optimization problem? If the problem is

\begin{align} \text{minimize}_x & \quad f(x) \\ \text{subject to} & \quad Ax = b \\ &\quad h(x) \leq 0, \end{align} and $\mathcal D$ is the domain of the problem, then we can form something like an "epigraph" for this optimization problem as follows: \begin{equation} C = \{ (u,v,t) \mid \exists x \in \mathcal D \text{s.t.} u = Ax - b, h(x) \leq v, f(x) \leq t \}. \end{equation}

By thinking of the optimization problem in terms of half-spaces containing $C$ (or in terms of hyperplanes supporting $C$), we get a geometric interpretation of the dual problem and a nice way to prove a strong duality theorem. (This is the approach taken in Boyd and Vandenberghe.)

QUESTIONS: Have I explained anything here incorrectly? Is there a better way to organize or understand these ideas, or a better way to put them in perspective? Do you like this way of thinking about these ideas? Do you have any comments, even if not directly related, that you think I might find to be interesting or that might help me gain a deeper understanding of these ideas?

Thanks!