I've studied convex optimization pretty carefully, but don't feel that I have yet "grokked" the dual problem. Here are some questions I would like to understand more deeply/clearly/simply:

- How would somebody think of the dual problem? What thought process would lead someone to consider the dual problem and to recognize that it's valuable/interesting?
- In the case of a convex optimization problem, is there any obvious reason to expect that strong duality should (usually) hold?
- It often happens that the dual of the dual problem is the primal problem. However, this seems like a complete surprise to me. Is there any intuitive reason to expect that this should happen?
- Does the use of the word "dual" or "duality" in optimization have anything to do with the dual space in linear algebra? Or are they just different concepts that go by the same name. What about the use of the word "dual" in projective geometry — is there a connection there?
- You can define the dual problem and prove theorems about strong duality without ever mentioning the Fenchel conjugate. For example, Boyd and Vandenberghe prove a strong duality theorem without mentioning the Fenchel conjugate in their proof. And yet, people often talk as if the Fenchel conjugate is somehow the "essence" of duality, and make it sound as if the whole theory of duality is based on the Fenchel conjugate. Why is the Fenchel conjugate considered to have such fundamental importance?

Note: I will now describe my current level of understanding of the intuition behind the dual problem. Please tell me if you think I might be missing any basic insights.

I have read the excellent notes about convex optimization by Guilherme Freitas, and in particular the part about "penalty intuition". When we are trying to solve

\begin{align*} \text{minimize} &\quad f(x) \\ \text{such that} & \quad h(x) \leq 0 \end{align*}

one might try to eliminate the constraints by introducing a penalty when constraints are violated. This gives us the new unconstrained problem

\begin{equation} \text{minimize} \quad f(x) + \langle \lambda ,h(x) \rangle \end{equation}

where $\lambda \geq 0$. It's not hard to see that for a given $\lambda \geq 0$, the optimal value of this unconstrained problem is less than or equal to the optimal value for the constrained problem. This gives us a new problem — find $\lambda$ so that the optimal value for the unconstrained problem is as large as possible. That is one way to imagine how somebody might have thought of the dual problem. Is this the best intuition for where the dual problem comes from?

Another viewpoint: the KKT conditions can be derived using what Freitas calls the "geometric intuition". Then, if we knew the value of the multipliers $\lambda$, it would be (often) much easier to find $x$. So, a new problem is to find $\lambda$. And if we can somehow recognize that $\lambda$ is a maximizer for the dual problem, then this suggests that we might try solving the dual problem.

Please explain or give references to any intuition that you think I might find interesting, even if it's not directly related to what I asked.