Consider the following equality constraint minimization problem:

minimize $\text{ }f(x)$

subject to $Ax=b$

Its Lagrangian is then:

$L(x,y) = f(x) + y^T(Ax-b)$

We can use then **gradient ascent** to solve this problem though it is said that it works with lots of strong assumptions.

Its Augmented Lagrangian is defined as:

$L(x,y)_{\rho} = f(x) + y^T(Ax-b) + \frac{\rho}{2} \|Ax-b \|^2_2$

Then we can use this in the **method of multipliers** (Hestenes, Powell 1969). It is said that this method is designed *to robustify dual ascent*.

Could anyone explain to me the intuition behind Augmented Lagrangian? As far as I can see we are combining penalty function method with Lagrange multiplier. But how does this bring robustness?

**P.S.** Boyd says that *The benefit of including the penalty term is that $g_p$ (dual function associated to the Augmented Lagrangian) can be shown to be differentiable under rather mild conditions on the original problem.*

How does adding extra term help to make the construction "more differentiable"? Or he refers to the fact that, say, when we deal with the L1 or nuclear norms, this extra term allows us to introduce proximal operator?