Let $A,A_h\in M_{n\times m}(\mathbb{R})$ be $n\times m$ matrices with $\Vert (A-A_h)x\Vert\leq h^2 \Vert x\Vert$ for $h\in (0,1)$ and $e\in \mathbb{R}^n$. Now consider for a fixed $\lambda>0$ the optimization problems $$\min_xe^TA_{(h)}x+\lambda\Vert x\Vert_2^2\quad \text{subject to}\quad \Vert x\Vert_1\leq C_1, \Vert x\Vert_\infty\leq C_2.$$

I'll denote the minimizer of the Problem with $A_h$ by $x_h$ and of the one with $A$ by $x_0$. I already showed that $x_h$ converges to $x_0$. I would furthermore like to get some convergence rates, but so far I was only able to derive one for the case where $x_0$ is strictly feasible.

My approach was to use the KKT conditions, i. e $$ 0\in A_{(h)}^Te+2\lambda x+\mu_1^{(h)}\partial\Vert x \Vert_1+\mu_2^{(h)}\partial\Vert x \Vert_\infty,$$ $$\mu_1^{(h)}, \mu_2^{(h)}\geq0,\quad \mu_1^{(h)}(\Vert x \Vert_1-C_1)=0,\quad \mu_2^{(h)}(\Vert x \Vert_\infty-C_2)=0$$ $$ \Vert x\Vert_1\leq C_1, \Vert x\Vert_\infty\leq C_2.$$ I calculated the subdifferentials: $$\partial\Vert x \Vert_1=\{v:\Vert v\Vert_\infty\leq 1, v^Tx=\Vert x\Vert_1\}$$ $$\partial\Vert x \Vert_\infty=\{v:\Vert v\Vert_1\leq 1, v^Tx=\Vert x\Vert_\infty\}$$ We have $\Vert A^Te-A_h^Te\Vert\leq h^2 \Vert e\Vert$. Therefore when $x_0$ is strictly feasible, then because of the convergence I can assume that $x_h$ is strictly feasible as well. Hence $\mu_1^{(h)}=\mu_2^{(h)}=0$, so I can easily derive $\Vert x_0-x_h\Vert\leq C h^2$.

Unfortunatly I can't come up with estimates in the other cases.

EDIT: We can assume, that $x_0\geq 0$ (componentwise), otherwise we can transform the coordinate system. And we can restrict ourselves to the case, where $x_h$ lies in the same orthant as $x_0$, i.e $x_h\geq 0$. For $x=(x_1,..,x_m)\geq 0 $ and $v\in \partial\Vert x\Vert_1$ we get $v_i=1$ when $x_i\not=0$ and $v_i\in [-1,1]$ else.

If we now look at the fairly simple case $\Vert x_0\Vert_1=\Vert x_h\Vert_1=C_1$ and $\Vert x_0\Vert_\infty=\Vert x_h\Vert_\infty<C_2$ with $x_0,x_h>0$, then we know that $\partial\Vert x_0 \Vert_1=\partial\Vert x_h \Vert_1=\{(1,..1)^T\}$. Hence $$0= A_{(h)}^Te+2\lambda x+\mu_1^{(h)}(1,..1)^T.$$ But even in this simple case I don't know how to bound the convergence of $\mu_1^h\rightarrow \mu_1$.

EDIT: I would furthermore like to add the condition $\sum_{i=1}^m x_i=0.$ Once again the same problem with the convergence of the multipliers arises. One can probably incooperate this condition into the problem, without adding a new condition. E.g. we can define the matrix $P$ by $x=P\tilde{x}$, where $\tilde{x}=(x_1,..,x_{m-1})^T$, and notice $\partial (f\circ P)(\tilde{x}) = P^T\partial f(P\tilde{x})$ if $f$ convex. I'll figure this out in more detail tomorrow!

EDIT: I made some progress and could solve the case where $\Vert x\Vert_\infty \leq C_2$ is active. But I had to assume that $(x_1,..,x_{m-1})\geq 0$. I'm unsure if I can assume this w.l.o.g because of the condition $\sum_{i=1}^m x_i=0.$

I appreciate any advice.

  • 283
  • 1
  • 11
  • Have you tried using the monotonicity of the subdifferential? The strictly feasibly case is easier because it is basically a smooth problem at that point - all nonsmoothness here is coming from the constraints. – j. kookalinski Jan 16 '22 at 15:04
  • You mean $\langle u-v,x-y\rangle \geq 0$ for $u\in\partial f(x), \ v\in\partial f(y) $? I couldn't figure out how to use it. – Nem49 Jan 17 '22 at 10:14

1 Answers1


You can rewrite your objective function as $$ e^TA_hx+\lambda\|x\|_2^2=\lambda\left\|\frac{v_h}{2\lambda}+x\,\right\|_2^2-\frac{v_h^Tv_h}{4\lambda}\ , $$ where $\ v_h=A_h^Te\ $, and your constraint set, $$ \mathscr{C}=\big\{x\in\mathbb{R}^m\,\big|\,\|x\|_1\le C_1,\|x\|_\infty\le C_2\big\}\ , $$ is convex and closed. Your optimisation problem therefore reduces to finding the point $\ x\ $ in $\ \mathscr{C}\ $ which is closest (in Euclidean distance) to $\ -\frac{v_h}{2\lambda}\ $. The optimiser, $\ x_h\ $ , must satisfy the inequality $$ 0\le\big(y-x_h\big)^T\left(\frac{v_h}{2\lambda}+x_h\right) $$ for all $\ y\in\mathscr{C}\ $. In particular, $$ 0\le\big(x_0-x_h\big)^T\left(\frac{v_h}{2\lambda}+x_h\right)\ , $$ because $\ x_0\in\mathscr{C}\ $, and $$ 0\le\big(x_h-x_0\big)^T\left(\frac{v_0}{2\lambda}+x_0\right)\ , $$ because $\ x_h\in\mathscr{C}\ $. Adding these two inequalities gives $$ 0\le\frac{1}{2\lambda}\big(x_0-x_h\big)^T\big(v_h-v_0\big)+\big(x_0-x_h\big)^T\big(x_h-x_0\big)\ , $$ or \begin{align} \left\|x_h-x_0\right\|_2^2&\le\frac{1}{2\lambda}\big(x_0-x_h\big)^T\big(v_h-v_0\big)\\ &=\frac{1}{2\lambda}\big(x_0-x_h\big)^T\big(A_{h}-A\big)^Te\\ &=\frac{1}{2\lambda}e^T\big(A_{h}-A\big)\big(x_0-x_h\big)\\ &\le\frac{1}{2\lambda}\left\|e\right\|_2\left\|\big(A_{h}-A\big)\big(x_0-x_h\big)\right\|_2\ , \end{align} by Cauchy-Schwarz, from which it follows that \begin{align} \left\|x_h-x_0\right\|_2^2 &\le\frac{h^2\|e\|_2\left\|x_h-x_0\right\|_2}{2\lambda}\ , \end{align} and hence, \begin{align} \left\|x_h-x_0\right\|_2\le\frac{h^2\|e\|_2}{2\lambda}\ . \end{align} Note that the same argument works when $\ \mathscr{C}\ $ is any non-empty closed convex set, so adding the constraint $\ \sum_\limits{i=1}^mx_i=0\ $ will not cause any problems.

Derivation of inequality $\ 0\le\big(y-x_h\big)^T\left(\frac{v_h}{2\lambda}+x_h\right) $

To simplify the notation, put $\ v=\frac{v_h}{2\lambda}\ $. Then the optimisation problem is equivalent to minimising the function \begin{align} f(x)&=\|v+y\|^2\\ &=\|v+x_h +y-x_h\|^2\\ &=f(x_h)+2(y-x_h)^T(v+x_h)+\|y-x_h\|^2 \end{align} over the constraint set. Since the constraint set is convex, then if $\ y\ $ is in it, so is $\ z_\alpha=\alpha y+(1-\alpha)x_h\ $ for any $\ \alpha\in[0,1]\ $. Therefore, $\ f(x_h)\le f(z_\alpha)\ $ for all such alpha. Since $\ z_\alpha-x_h=\alpha(y-x_h)\ $, this is equivalent to $$ 0\le 2\alpha(y-x_h)^T(v+x_h)+\alpha^2\|y-x_h\|^2 $$ for all $\ \alpha\in[0,1]\ $, or $$ -\frac{\alpha\|y-x_h\|^2}{2}\le (y-x_h)^T(v+x_h) $$ for all $\ \alpha\in(0,1]\ $. The desired inequality now follows by taking the limit as $\ \alpha\rightarrow0^+\ $ on the left of the inequality immediately above.

lonza leggiera
  • 20,155
  • 2
  • 6
  • 24
  • That, and the optimality of $\ x_h\ $. If $\ y\ $ satisfies the constraints, then so does $\ \alpha y + (1-\alpha)x_h\ $ for any $\ \alpha\in[0,1]\ $. If the inequality weren't satisfied then the value of the objective function at $\ \alpha y + (1-\alpha)x_h\ $ would be strictly smaller than at $\ x_h\ $ for any sufficiently small positive value of $\ \alpha\ $. – lonza leggiera Jan 25 '22 at 03:23
  • Right, so we have $f(x)\geq f(y) + \nabla f(y)^T(x-y), \forall x,y\in\mathscr{C}$, since $f(x) = \lVert v_h/(2\lambda) + x \rVert^2$ is convex. Thus, for a minimizer $x_h\in \mathscr{C}$, then we get $$ 0 \leq f(y) - f(x_h) \leq \nabla f(y)^T(y - x_h) = 2(y - x_h)^T(y + \frac{v_h}{2\lambda}),\quad\forall y\in\mathscr{C}. $$ Am I following you correctly, because this is not the same inequality as in your answer? – V.S.e.H. Jan 25 '22 at 09:29
  • My apologies. I missed the words "first-order" before "convexity condition" in your query. If you meant first-order *optimality* condition, my response might not have been comprehensible. I have now added a more detailed derivation of the inequality at the end of my answer. – lonza leggiera Jan 25 '22 at 11:34
  • Very nice, I think this inequality is key to the whole answer, so thanks for adding full proof of it. It reminds me of the proof for the hyperplane separation theorem, so I wouldn’t exclude the possibility of it working for non-twice diff. functions. Perhaps I can pose a question about it. Anyway +1! – V.S.e.H. Jan 25 '22 at 15:40
  • Oh, I'm fairly sure the inequality $\ 0\le(y-x_h)^T\nabla f(x_h)\ $ will still hold, even if $\ f\ $ is only once differentiable. However, without second-order derivatives available, I suspect a more delicate argument will be needed to prove it.. – lonza leggiera Jan 25 '22 at 15:56
  • Actually, it's a pretty easy proof. Consider the set $D(x) = \{ s\in\mathbb{R}^n ~|~ \exists \lambda > 0: x + ts \in \mathscr{C}, \forall t\in[0,\lambda] \}$, i.e. the set of feasible directions. This set is infact a convex cone. Now let $x^*\in\mathscr{C}$ be a minimizer to $f$, then note that $s\in D(x^*) \iff s=\lambda(y-x^*)$ for a $y\in\mathscr{C}$ and $\lambda > 0$. Then, we have that $f(x^* + s) - f(x^*) \geq 0$. Dividing both sides by $\lambda$ and taking the limit $\lambda \to 0$, then we get $\nabla f(x^*)^T(y - x^*)\geq 0$. – V.S.e.H. Jan 26 '22 at 13:48