13

We know that there are two definitions to describe lasso.

Regression with constraint definition: $$\min\limits_{\beta} \|y-X\beta\|^2, \sum\limits_{p}|\beta_p|\leq t, \exists t $$ Regression with penalty definition: $$\min\limits_{\beta} \|y-X\beta\|^2+\lambda\sum\limits_{p}|\beta_p|, \exists\lambda$$

But how to convince these two definition are equivalent for some $t$ and $\lambda$? I think Lagrange multipliers is the key to show the relationship between two definitions. However, I failed to work out it rigorously because I assume the properties of lasso ($\sum\limits_{p}|\beta_p|=t$) in regression with constraint definition.

Does anyone can show me the complete and rigorous proof of these two definitions are equivalent for some $t$ and $\lambda$?

Thank you very much if you can help.

EDIT: According to the the comments below, I edited my question.

Timespace
  • 369
  • 1
  • 2
  • 11
  • 1
    I think you have a problem here... $y-X\beta$ is a vector, so the squared term is ill-posed. Furthermore, $\|\beta\|$ is a scalar, with no subscripts to sum over. I'm thinking you've put the summation in the wrong place. For instance, I suspect the penalty definition is $\left(\sum_i(y_i-X_i\beta)^2\right)+\lambda\|\beta\|$. – Michael Grant Jun 10 '13 at 04:20
  • 1
    Furthermore, there is certainly not a one-to-one correspondence between $\lambda$ and $t$ without further qualifications. For instance, let $\bar{\beta}$ be the minimizer of the penalty definition with $\lambda=0$. Then the optimal value of the constraint definition is the same for any $t\geq\|\bar{\beta}\|$, the optimal value of the constrained problem is the same. Thus all values of $t\in[\|\bar{\beta}\|,+\infty)$ correspond to $\lambda=0$. Similarly, for some choices of the norm $\|\beta\|$, there may be an infinite interval of $\lambda$ values corresponding to $t=0$. – Michael Grant Jun 10 '13 at 04:31
  • Your edits are not sufficient. First of all, $\beta_i$ is a scalar, so $\|\beta_i\|$ is just $|\beta_i|$, correct? Given that this is the LASSO I'd just replace the whole summation with $\|\beta\|_1$ and be done with it. But there is still the matter of the quantity $(y-X\beta)^2$, which is a vector, not a scalar. So the objective function is ill-posed. – Michael Grant Jun 10 '13 at 17:06

2 Answers2

8

Here is one direction.

(1) The constrained problem is of the form \begin{array}{ll} \text{Find} & x \\ \text{To minimize} & f(x) \\ \text{such that} & g(x) \leqslant t \\ & \llap{-} g(x) \leqslant t. \end{array} Its Lagrangian is $$ L(x, \mu_1, \mu_2) = f(x) + \mu_1' ( g(x) - t ) + \mu_2' ( - g(x) - t ) $$ and the KKT conditions are \begin{align*} \nabla f + \mu_1' \nabla g - \mu_2' \nabla g &= 0 \\ \mu_1, \mu_2 &\geqslant 0 \\ \mu_1' ( g(x) - t ) &= 0 \\ \mu_2' ( - g(x) - t ) &= 0 . \end{align*}

(2) The penalized problem is just the minimization of $f(x) + \lambda' g(x)$. It is unconstrained, and the first order condition is $$ \nabla f + \lambda ' \nabla g = 0. $$

Given a solution of the constrained problem, the penalized problem with $\lambda = \mu_1 - \mu_2$ has the same solution. (For a complete proof, you also need to check that, in your situation, the KKT conditions and the first order condition are necessary and sufficient conditions.)

  • 1
    Aren't the first order conditions for the penalised problem formulation not applicable here? The penalty function $g$ is not differentiable here. – Vossler Mar 26 '16 at 20:20
  • Also, the constrained problem is not of the form that you wrote since it is a sum of absolute values, not the absolute value of the sum. – Vossler Mar 26 '16 at 21:16
  • 1
    @Vossler: the KKT (and first order) conditions are still applicable with a subgradient instead of a gradient: since $g$ is convex, it has a subgradient. The correct constraint can be recovered by setting $g(x) = \left\| x \right\| _1$; the $-g(x) \leqslant t$ constraint I had added is then no longer needed. For a better explanation of the equivalence between the constrained and penalized formulations of the lasso, one can check [Statistical Learning with Sparsity](https://trevorhastie.github.io/), in particular exercises 5.2 to 5.4. – Vincent Zoonekynd Mar 27 '16 at 15:09
0

It's not really intuitive to see, but here is one way to look at it using only basic inference.

Suppose $\beta^{*}$ is a solution to the regression with penalty problem (with some $\lambda$) and $\beta^{**}$ is a solution to the regression with constraint problem with $t = |\beta^{*}|$ (where $|\bullet|$ denotes the $\ell_1$ norm : $|\beta| = \sum\limits_{p}|\beta_p|$). We show that the two problems are equivalent in the sense that $\beta^{*}$ is also a solution to the constraint problem and that $\beta^{**}$ is also a solution to the penalty problem.

  1. Because $\beta^{*}$ is a solution of the penalty problem, for all $\beta$ we have $\|y-X\beta\|^2+\lambda|\beta| \ge \|y-X\beta^{*}\|^2+\lambda|\beta^{*}|$
    which implies that $\|y-X\beta\|^2 \ge \|y-X\beta^{*}\|^2$ for all $\beta$ such that $|\beta| \le t = \beta^{*} $ from which we conlude that $\beta^{*}$ is a solution to the constraint problem.
  2. Because $\beta^{**}$ is a solution of the constraint problem we have $|\beta^{**}| \le t=|\beta^{*}|$ and $\|y-X\beta^{*}\|^2 \ge \|y-X\beta^{**}\|^2$ and because $\beta^{*}$ is a solution of the penalty problem we have
    $\forall \beta, \space \|y-X\beta\|^2+\lambda|\beta| \ge \|y-X\beta^{*}\|^2+\lambda|\beta^{*}|$.
    Those imply $\forall \beta, \space \|y-X\beta\|^2+\lambda|\beta| \ge \|y-X\beta^{**}\|^2+\lambda|\beta^{**}|$ which allows us to say that $\beta^{**}$ is a solution to the penalty problem.

We can easily see that $|\beta^{*}| = |\beta^{**}|$ and $\|y-X\beta^{*}\|^2 = \|y-X\beta^{**}\|^2$ but we don't really need this in the proof.