6

I'm, trying to get an intuitive feel for the various constraint qualifications for KKT points. Most of them seem to rely on the linear independence of $\nabla g_i(x^*)$ where $g_i$ are the equality constraints. The book doesn't really state why.

The first KKT condition states

$\nabla f(x^*) + \sum\mu_i\nabla g_i(x^*) + \sum \lambda_j\nabla h_j(x^*) = \textbf{0}$

A hazy first guess is that if the gradients were to be linearly dependent, then any choice of $\lambda$ could potentially satisfy the condition, thus producing 'trivial' KKT points. We need to ensure that the term associated with the equality constraints only vanishes for $\lambda_j \equiv 0 $.

I think this is somewhat in analogue to the situation with the $\mu$ multiplier potentially being zero for $\nabla f(x^*)$ in the Fritz-John conditions.

Self-studying is hard :) Am I anywhere close here?

Benjamin Lindqvist
  • 2,924
  • 1
  • 25
  • 40

2 Answers2

4

With an eight month delay:

Let me first state that I'm not a pro and a self-student myself which might ease our conversation.

What I noted is that we seem to use different definitions of the linear independence constraint qualification. Nocedal/Wright's Numerical Optimization (1999, 1E) states in

Definition 12.1 (LICQ). Given the point $x^*$ and the active set $\mathcal{A}(x^*)$ defined by (12.29), we say that the linear independence constraint qualification (LICQ) holds if the set of active constraint gradients $\{\nabla c_i(x^*), \in \mathcal{A}(x^*)\}$ is linearly independent.

Definition (12.29), in turn, says that the active set comprises of the indices of the equality constraints and the active inequality constraints. (The $c_i$ in the above definition include equality and inequality constraints likewise.)

The Lagrange's stationarity (given by you above) plus the requirements that

  • it holds for active constraints only
  • the active inequality constraints´ coefficents need to be non-negative
  • is equivalent to stating that in a vicinity of $x^*$ there's no (feasible) point that evaluates the objective function smaller than $x^*$.

    Where the LICQ guarantees that any sequence of feasible points that converges towards $x^*$ has the property $f(z_k) > f(x^*)$ for sufficiently large $k$ or in a vicinity of $x^*$, respectively.

    But I must admit that it's not totally clear to me why the LICQ is required for a $1^{\text{st}}$-order necessary optimality condition for constrained problems (see this post).

    Max Herrmann
    • 1,346
    • 10
    • 24
    1

    The direction of decrease(or increase) of the objective at the optimal point needs to be away from the constraint set by definition.

    The gradient of each constraint at the optimal point is a vector. The collection of these gradients for each constraint form a basis for the space of infeasible points with the lagrange multipliers being the coordinates of these infeasible points.

    It is required therefore that the gradient vector of the objective function at the optimal point lie in this space as it would point to a infeasible direction.

    The first order conditions simply state that the gradient of the objective at the optimal point can be expressed in terms of the basis vectors of the space it lives in(the basis vectors being gradients of the constraints) with the lagrange multipliers being the coordinates of the basis vectors.

    varunmarda
    • 84
    • 6
    • I don't think this answers the question. The linear independence of the equality constraints (let's say the problem only has equality constraints for simplicity), aka LICQ, is a necessary condition for a minimizer point $x^*$ to satisfy the KKT conditions. You explain the intuition behind why we expect the first KKT condition to hold at a minimum, but the question is, "Why is LICQ necessary for the first KKT condition to always hold at a minimum?" In particular, your intuition taken alone seems to imply that it isn't necessary since why should it matter if the basis vectors that are the ... – nkyraf33 May 23 '21 at 20:23
    • ... gradients of the constraints have "duplicates" due to not being linearly independent? By your intuition alone, it seems like it would just be enough for the gradient of the objective to live in the space spanned by the gradients of the constraints, meaning we don't need LICQ. I'm not an expert though, so let me know if I'm missing your point. – nkyraf33 May 23 '21 at 20:25
    • Edit: For my first comment above (which I unfortunately can't edit anymore), I should have said that LICQ is a sufficient condition for a minimizer point $x^*$ to satisfy the KKT conditions. It isn't a necessary condition though, since there are other constraint qualifications you can use. – nkyraf33 May 24 '21 at 18:17