Karush-Kuhn-Tucker conditions (KKT)
Constrained Minima and Lagrange Multipliers
Consider the following constrained optimization problem:
\begin{aligned} \text{Minimize:}\qquad & f(\bm{x}) \\ \text{Subject to:}\qquad & h_i(\bm{x}) = 0 \quad \text{for} \quad i = 1,2,\ldots,m \end{aligned} \tag{1}
Second-Order Conditions
The sufficient condition for optimality states that if the Hessian matrix, \bm{H}, is positive definite at the point (\bm{x}^\star, \bm{\lambda}^\star), then \bm{x}^\star is a strict local minimizer of the original constrained problem.
The following theorem formalizes this sufficient condition for optimality.
Theorem 1 (of Lagrange Multipliers) Let f \in C^2(\mathbb{R}^n, \mathbb{R}) and \bm{h} \in C^2(\mathbb{R}^n, \mathbb{R}^m) with \bm{x}^\star be a local minimum of problem Equation 1, i.e.,
- f(\bm{x}^\star) is a minimum
- subject to \bm{h}(\bm{x}^\star) = \bm{0}.
- and \nabla \bm{h}(\bm{x}^\star) is full rank
and define the Lagrangian
\mathcal{L}(\bm{x}^\star, \bm{\lambda}) = f(\bm{x}^\star) - \sum_{k=1}^m \lambda_k h_k(\bm{x}^\star)
then there exist scalars \lambda_k, k = 1, \dots, m, such that:
first order necessary condition: \nabla_x \mathcal{L}(\bm{x}^\star, \bm{\lambda}) = \nabla f(\bm{x}^\star) - \sum_{k=1}^m \lambda_k \nabla h_k(\bm{x}^\star) = 0. \tag{A}
second order condition:
for all \bm{z} \in \mathbb{R}^n such that \nabla \bm{h}(\bm{x}^\star)\bm{z} = \bm{0} and \bm{z}\neq \bm{0}, it holds that:necessary: \bm{z}^T \left( \nabla^2 f(\bm{x}^\star) - \sum_{k=1}^m \lambda_k \nabla^2 h_k(\bm{x}^\star) \right) \bm{z} \geq 0, \tag{B}
i.e., the matrix \nabla^2_x \big( f(\bm{x}^\star) - \bm{\lambda} \cdot \bm{h}(\bm{x}^\star) \big) is positive semi-definite in the kernel of \nabla \bm{h}(\bm{x}^\star).
sufficient:
\bm{z}^T \left( \nabla^2 f(\bm{x}^\star) - \sum_{k=1}^m \lambda_k \nabla^2 h_k(\bm{x}^\star) \right) \bm{z} \geq \epsilon \|\bm{z}\|^2>0, \tag{C}
i.e., the matrix \nabla^2_x \big( f(\bm{x}^\star) - \bm{\lambda} \cdot \bm{h}(\bm{x}^\star) \big) is positive definite in the kernel of \nabla \bm{h}(\bm{x}^\star).
Inequality constraints
It is possible to adapt Theorem 1 for inequality constraints. Consider the NLP problem
\begin{aligned} \textrm{minimize:} \quad & f(\bm{x}) \\[0.5em] \textrm{subject to:}\quad & h_i(\bm{x})=0 \quad &i&=1,2,\ldots,m \\[0.5em] & g_i(\bm{x})\geq 0\quad &i&=1,2,\ldots,p \end{aligned} \tag{14}
introducing the slack variables e_i for i=1,2,\ldots,p and the vector \bm{y}^T=(\bm{x}^T,\bm{e}^T) the new problem
\begin{aligned} \textrm{minimize:}\quad && \mathsf{f}(\bm{y}) &=f(\bm{x}) \\[0.5em] \textrm{subject to:}\quad && \mathsf{h}_i(\bm{y}) &= h_i(\bm{x})=0\quad &i&=1,2,\ldots,m \\[0.5em] && \mathsf{h}_{i+m}(\bm{y}) &= g_i(\bm{x})-e_i^2 = 0\quad &i&=1,2,\ldots,p \end{aligned} \tag{15}
Using this formulation we deduce:
Lemma 1 (first order condition) Defining the Lagrangian: \mathcal{L}(\bm{x},\bm{\lambda},\bm{\mu}) = f(\bm{x}) -\sum_{k=1}^m \lambda_k h_k(\bm{x}) -\sum_{\ell=1}^p \mu_\ell g_\ell(\bm{x}) \tag{16}
The first order condition for problem Equation 15 become
\begin{aligned} \nabla_x\mathcal{L}(\bm{x}^\star,\bm{\lambda},\bm{\mu}) &=\bm{0}^T, \\[1em] h_k(\bm{x}^\star) &= 0, \qquad & k & =1,2,\ldots m\\[1em] \mu_\ell g_\ell(\bm{x}^\star) & = 0,\quad g_\ell(\bm{x}^\star)\geq 0\quad & \ell&=1,2,\ldots p\\[1em] \end{aligned}
So that when g_\ell(\bm{x}^\star)>0 then \mu_\ell=0. Finally, the gradient of active constraints are linearly independent.
second order condition become:
Lemma 2 (second order necessary condition) Using the Lagrangian Equation 16, the second order condition for problem Equation 15 become:
\begin{aligned} \bm{z}^T\nabla^2_x\mathcal{L}(\bm{x}^\star,\bm{\lambda},\bm{\mu})\bm{z} & \geq 0, \\[0.5em] \mu_\ell & \geq 0, \qquad \ell = 1,2,\ldots,p \end{aligned}
for all \bm{z} in the kernel of the gradient of the active constraints.
Lemma 3 (second order sufficient condition) Using the Lagrangian Equation 16, the second order sufficient condition for problem Equation 15 become:
\begin{aligned} \bm{z}^T\nabla^2_x\mathcal{L}(\bm{x}^\star,\bm{\lambda},\bm{\mu})\bm{z} & \geq \epsilon \|\bm{z}\|^2, \\[0.5em] \mu_\ell & = 0, \qquad \textrm{for $g_\ell(\bm{x}^\star)>0$} \\[0.5em] \mu_\ell & > 0, \qquad \textrm{for $g_\ell(\bm{x}^\star)=0$} \end{aligned}
for all \bm{z} in the kernel of the gradient of the active constraints.
The last two lemma can be resumed in:
Constrained minima, NLP problem
Consider the constrained minimization problem
\begin{aligned} \textrm{minimize:}\quad && f(\bm{x})& \\[0.5em] \textrm{subject to:}\quad && h_i(\bm{x}) &=0\quad &i&=1,2,\ldots,m \\[0.5em] && g_\ell(\bm{x}) &\geq 0\quad &\ell&=1,2,\ldots,p \end{aligned} \tag{26}
Definition 1 (ammissibile point) A point \bm{x}^\star is admissible if
\begin{aligned} h_k(\bm{x}^\star) &= 0 \qquad & k&=1,2, \ldots ,m\\ g_\ell(\bm{x}^\star) &\geq 0 \qquad & \ell&=1,2, \ldots ,p\\ \end{aligned}
Definition 2 (Feasible region) \begin{aligned} \mathcal{F} = \Big\{ \bm{x}\in\Bbb{R}^n\;|\; & h_k(\bm{x})=0,\quad & k&=1,2,\ldots,m,\\ & g_\ell(\bm{x}) \geq 0,\quad & \ell&=1,2,\ldots,p, \Big\} \end{aligned}
is called the feasible region or set of feasible points.
Definition 3 (active constraints) The following set
\mathcal{A}(\bm{x}^\star) = \{ k\;|\; g_k(\bm{x}^\star)=0\}
is named active constraints set. This set can be split in two subsets
\begin{aligned} \mathcal{A}^+(\bm{x}^\star,\bm{\mu }^*) &= \{ k\;|\; g_k(\bm{x}^\star)=0,\quad \mu ^*_k>0\} \\ \mathcal{A}^0(\bm{x}^\star,\bm{\mu }^*) &= \{ k\;|\; g_k(\bm{x}^\star)=0,\quad \mu ^*_k=0\} \end{aligned}
\mathcal{A}^+(\bm{x}^\star,\bm{\mu }^*) are the strongly active constraints and \mathcal{A}^0(\bm{x}^\star,\bm{\mu }^*) are the weakly active constraints.
Obviously
\mathcal{A}^0(\bm{x}^\star,\bm{\mu }^*)\bigcap \mathcal{A}^+(\bm{x}^\star,\bm{\mu }^*)=\emptyset \quad\textrm{and}\quad \mathcal{A}^0(\bm{x}^\star,\bm{\mu }^*)\bigcup \mathcal{A}^+(\bm{x}^\star,\bm{\mu }^*)=\mathcal{A}(\bm{x}^\star)
Constrained minima and KKT
Here, we present classical theorems of constrained minimization, including those by Karush, Kuhn, Tucker, and others, without providing detailed proofs.
The following theorem (see John (1948)) establishes the necessary conditions for constrained minima. Note that no specific conditions on the constraints are required.
Theorem 2 (Fritz John) If the functions f(\bm{x}), g_1(\bm{x}), \ldots, g_p(\bm{x}), are differentiable, then a necessary condition that \bm{x}^\star be a local minimum to problem:
\begin{aligned} \textrm{minimize:}\qquad & f(\bm{x}) \\[0.5em] \textrm{subject to:}\qquad & g_i(\bm{x})\geq 0\qquad &i=1,2,\ldots,p \end{aligned}
is that there exist scalars \mu_0^\star, \mu_1^\star, \ldots, \mu_p^\star, (not all zero) such that the following inequalities and equalities are satisfied:
\begin{aligned} \nabla_x \mathcal{L}(\bm{x}^\star,\bm{\mu}^*) &=\bm{0}^T & \\ \mu_k^*g_k(\bm{x}^\star) &= 0,\qquad & k&=1,2,\ldots,p; \\ \mu_k^* &\geq 0,\qquad & k&=0,1,2,\ldots,p; \end{aligned}
where
\mathcal{L}(\bm{x},\bm{\mu}) = \mu_0 f(\bm{x}) -\sum_{k=1}^p \mu_k\,g_k(\bm{x})
In Kuhn and Tucker (1951) the authors showed that if a condition, called the first order constraint qualification, holds at \bm{x}^\star, \bm{\lambda}^\star then \lambda_0 can be taken equal to 1.
Definition 4 (Constraints qualification LI) Let be the unilateral and bilateral constraints \bm{g}(\bm{x}) and \bm{h}(\bm{x}), the point \bm{x}^\star is admissible if
g_k(\bm{x}^\star) \geq 0, \qquad h_k(\bm{x}^\star) = 0.
The constraints \bm{g}(\bm{x}) and \bm{h}(\bm{x}) are qualified at \bm{x}^\star if the point \bm{x}^\star is admissible and the vectors
\{\nabla g_k(\bm{x}^\star)\;:\;k\in \mathcal{A}(\bm{x}^\star)\}\; \bigcup\; \{ \nabla h_1(\bm{x}^\star),\nabla h_2(\bm{x}^\star),\ldots,\nabla h_m(\bm{x}^\star)\}
are linearly independent.
The next theorems are taken from Nocedal and Wright (2006) and McCormick (1967).
Theorem 3 (First order necessary conditions) Let f\in C^1(\Bbb{R}^n) and the constraints \bm{g}\in C^1(\Bbb{R}^n,\Bbb{R}^p) and \bm{h}\in C^1(\Bbb{R}^n,\Bbb{R}^m). Suppose that \bm{x}^\star is a local minima of Equation 26 and that the constraints qualification LI holds at \bm{x}^\star. Then there are Lagrange multiplier vectors \bm{\lambda} and \bm{\mu} such that the following conditions are satisfied at (\bm{x}^\star,\bm{\lambda},\bm{\mu}):
\begin{aligned} \nabla_x \mathcal{L}(\bm{x}^\star,\bm{\lambda},\bm{\mu}) & =\bm{0}^T & \\[0.5em] h_k(\bm{x}^\star) &=0,\qquad & k&=1,2,\ldots,m; \\[0.5em] \mu_\ell^*g_k(\bm{x}^\star) &= 0,\qquad & \ell&=1,2,\ldots,p; \\[0.5em] \mu_\ell^*&\geq 0,\qquad & \ell&=1,2,\ldots,p; \end{aligned}
where
\mathcal{L}(\bm{x},\bm{\lambda},\bm{\mu}) = f(\bm{x}) -\sum_{k=1}^m \lambda_k\,h_k(\bm{x}) -\sum_{\ell=1}^p \mu_\ell\,g_\ell(\bm{x})
Theorem 4 (Second order necessary conditions) Let f\in C^2(\Bbb{R}^n) and the constraints \bm{g}\in C^2(\Bbb{R}^n,\Bbb{R}^p) and \bm{h}\in C^2(\Bbb{R}^n,\Bbb{R}^m).
Let \bm{x}^\star satisfying the first order necessary conditions, a necessary condition for \bm{x}^\star be a local minima is that the m+p scalars (Lagrange Multiplier) of the first order necessary condition satisfy:
\bm{d}^T \nabla_x^2 \mathcal{L}(\bm{x}^\star,\bm{\lambda}^*,\bm{\mu}^*) \bm{d} \color{blue}\geq 0
for all \bm{d} such that:
\nabla h_k(\bm{x}^\star)\bm{d} =0, \qquad \textrm{for}\quad k=1,2,\ldots,m
and for all active inequalities \ell\in \mathcal{A}(\bm{x}^\star):
Remark 1. The conditions
\begin{aligned} \color{DarkGreen} \nabla g_\ell(\bm{x}^\star)\bm{d}=0, \qquad & \color{DarkGreen} \textrm{if $\ell\in \mathcal{A}(\bm{x}^\star)$ and $\mu_\ell>0$} \\[1em] \color{DarkGreen} \nabla g_\ell(\bm{x}^\star)\bm{d}\geq 0, \qquad & \color{DarkGreen} \textrm{if $\ell\in \mathcal{A}(\bm{x}^\star)$ and $\mu_\ell=0$} \end{aligned}
restrict the space of direction to be considered. If changed with
{\color{blue}\nabla g_\ell(\bm{x}^\star)\bm{d}=0,}\qquad {\color{blue}\textrm{if $\ell\in \mathcal{A}(\bm{x}^\star)$}}
Theorem 4 is still valid cause necessary condition is tested in a smaller set.
Theorem 5 (Second order sufficient conditions) Let f\in C^2(\Bbb{R}^n) and the constraints \bm{g}\in C^2(\Bbb{R}^n,\Bbb{R}^p) and \bm{h}\in C^2(\Bbb{R}^n,\Bbb{R}^m).
Let \bm{x}^\star satisfying the First order necessary conditions, a sufficient condition for \bm{x}^\star be a local minima is that the m+p scalars (Lagrange Multiplier) of the first order necessary condition satisfy:
\bm{d}^T \nabla_x^2 \mathcal{L}(\bm{x}^\star,\bm{\lambda}^*,\bm{\mu}^*) \bm{d}\color{blue}\;\;>\;\; 0
for all \bm{d}\neq\bm{0} such that
\nabla h_k(\bm{x}^\star)\bm{d} =0, \qquad \textrm{for}\quad k=1,2,\ldots,m
and for all active inequalities \ell\in \mathcal{A}(\bm{x}^\star):
Remark 2. The condition
{\color{Magenta}\nabla g_k(\bm{x}^\star)\bm{d}\geq 0,\qquad \textrm{if $k\in \mathcal{A}(\bm{x}^\star)$ and $\mu_k=0$}}
restrict the space of direction to be considered. If omitted the Theorem 5 is still valid cause sufficient condition is tested in a larger set.
Other constraints qualifications
In the study of optimality condition the constraints and its gradients cannot be arbitrary. They must satisfy additional analytic/geometric properties. This properties are named constraints qualification. The easiest qualification (but also compelling) is linear independence (LI)
Definition 5 (Constraints qualification LI) Given the inequality constraints \bm{g}(\bm{x}) and equality constraints \bm{h}(\bm{x}), we will say than an admissible point \bm{x}^\star is qualified if the vectors
\{\nabla g_k(\bm{x}^\star)\;:\;k\in \mathcal{A}(\bm{x}^\star)\} \;\bigcup\; \{ \nabla h_1(\bm{x}^\star),\nabla h_2(\bm{x}^\star),\ldots ,\nabla h_m(\bm{x}^\star)\}
are linearly independent.
Mangasarian-Fromovitz qualification
This qualification is less stringent of the previous
Definition 6 (Mangasarian-Fromovitz qualification) Given the inequality constraints \bm{g}(\bm{x}) and equality constraints \bm{h}(\bm{x}), we will say than an admissible point \bm{x}^\star is qualified if does not exists a linear combination
\sum_{\ell\in \mathcal{A}(\bm{x}^\star)}^m \alpha_\ell \nabla g_\ell(\bm{x}^\star)+ \sum_{k=1}^m \beta _k \nabla h_k(\bm{x}^\star) = \bm{0}
with \alpha_\ell\geq 0 for \ell\in \mathcal{A}(\bm{x}^\star) and \alpha_\ell and \beta _k not all zero. That is, there is no non trivial linear combination for the null vector with \alpha _\ell\geq 0 for \ell\in \mathcal{A}(\bm{x}^\star).
Garth P. McCormick qualification
Definition 7 (Garth P. McCormick constraints qualification (1 order)) Given an admissible point \bm{x}^\star the constraints are first order qualified if for all direction \bm{d} that satisfy
\begin{aligned} \nabla h_k(\bm{x}^\star)\bm{d} &=0,\quad k\in \{1,2, \ldots ,m\}, \\ \nabla g_\ell(\bm{x}^\star)\bm{d} &\geq 0,\quad \ell\in\mathcal{A}(\bm{x}^\star), \end{aligned}
exists a curve \bm{x}\in \mathtt{C}^1(\Bbb{R},\Bbb{R}^n) and an \varepsilon>0 such that for 0<t<\varepsilon.
\begin{aligned} \bm{x}(0)&=\bm{x}^\star,\quad & h_k(\bm{x}(t))&=0, \quad k\in \{1,2, \ldots ,m\}, \\ \bm{x}'(0) &=\bm{d}, \quad & g_\ell(\bm{x}(t))&\geq 0, \quad \ell\in \{1,2, \ldots ,p\}. \end{aligned}
Definition 8 (Garth P. McCormick constraints qualification (2 order)) Given an admissible point \bm{x}^\star the constraints are second order qualified if for all direction \bm{d} that satisfy
\begin{aligned} \nabla h_k(\bm{x}^\star)\bm{d} &=0,\qquad k\in \{1,2, \ldots ,m\}, \\ \nabla g_\ell(\bm{x}^\star)\bm{d}&=0,\qquad \ell\in \mathcal{A}(\bm{x}^\star), \end{aligned}
exists a curve \bm{x}\in \mathtt{C}^2(\Bbb{R},\Bbb{R}^n) and an \varepsilon>0 such that for 0<t<\varepsilon.
\begin{aligned} \bm{x}(0) &=\bm{x}^\star, \quad & h_k(\bm{x}(t))&= 0,\quad & k &\in \{1,2, \ldots ,m\}, \\ \bm{x}'(0) &=\bm{d}, \quad & g_\ell(\bm{x}(t))&=0,\quad & \ell & \in \mathcal{A}(\bm{x}^\star). \end{aligned}
Sylvester’s Criterion for SPD Matrices
Theorem 6 (Sylvester’s Criterion) A symmetric matrix \bm{A} \in \Bbb{R}^{n \times n} is positive definite if and only if all its leading principal minors are positive. That is, \bm{A} is positive definite if and only if: \det(\bm{A}_k) > 0 \quad \text{for} \quad k = 1, 2, \ldots, n where \bm{A}_k is the upper-left k \times k submatrix (leading principal minor) of \bm{A}.
Trick for semi positive matrices
The Sylvester theorem gives no hint to determine if a matrix is semi-positive defined when a minor is equal to zero. However a way to determine if the matrix is semi-positive defined is by considering that the matrix
\bm{A} +\varepsilon \bm{I}
should be positive defined for values of \varepsilon approaching zero from the positive direction (for \varepsilon\rightarrow 0^+).
Example 1 (Trick for usage of the Sylvester theorem on semi-positive matrrices) Let’s consider now the matrix
\bm{A} = \begin{bmatrix} 1 & 2 \\ 2 & 4 \end{bmatrix}
The first minor \bm{A}_1 = \det[1] = 1 is positive, and so we have to compute the second principal minor noting that it’s equal to zero:
\bm{A}_2 = \det\begin{bmatrix} 1 & 2 \\ 2 & 4 \end{bmatrix} = 4 - 2\cdot 2 = 0
This relation gives no clue on determining if \bm{A} is positive defined or not, but considering the trick yet defined we can compute
\det\begin{bmatrix} 1+\varepsilon & 2 \\ 2 & 4 + \varepsilon \end{bmatrix} = (1+ \varepsilon)(4+\varepsilon) - 4 = 5\varepsilon
We can see that this expression, for \varepsilon \rightarrow 0^+, is strictly greater then zero and this might conclude our analyses stating that \bm{A} is semi positive defined.
The same result can be defined by computing the eigenvalues of \bm{A} and so calculating the roots of the polynomial
\begin{aligned} p(\lambda) &= \det\begin{bmatrix} 1-\lambda & 2 \\ 2 & 4-\lambda \end{bmatrix} \\ & = (1-\lambda)(4-\lambda) - 4 \\ & = \lambda^2-5\lambda \qquad \implies \quad \lambda_1=0,\; \lambda_2 = 5 \end{aligned}
Having one eigenvalues zero (and the other positive) determines that \bm{A} is semi positive defined.
Example 2 (Counter example of the Sylvester theorem) Let’s consider now the matrix
\bm{A} = \begin{bmatrix} 1 & 1 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 0 \end{bmatrix}
The first minor \bm{A}_1 = \det[1] = 1 is positive, while the second one is zero, in fact
\bm{A}_2 = \det\begin{bmatrix} 1 & 1 \\ 1 & 1 \end{bmatrix} = 0
This result gives no information but we can use the trick by considering the determinant
\det\begin{bmatrix} 1 + \varepsilon & 1 \\ 1 & 1 + \varepsilon \end{bmatrix} = (1+\varepsilon)^2 - 1 = \varepsilon^2 +2\varepsilon > 0
for \varepsilon > 0.
Being \bm{A} a 3\times3 matrix we also need to compute the third minor that leads to another zero:
\begin{aligned} \bm{A}_3 &= \det \begin{bmatrix} 1 & 1 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 0 \end{bmatrix} \\ & = 1 \det\begin{bmatrix} 1&1\\1&1 \end{bmatrix} - 1 \det\begin{bmatrix}1&1\\1&1\end{bmatrix} + 0 \det\begin{bmatrix}1&1\\1&1\end{bmatrix} \\ & = 0 \end{aligned}
We can use another time the Sylvester theorem trick considering the determinant
\begin{aligned} p(\varepsilon) & = \det\begin{bmatrix} 1+\varepsilon & 1 & 1 \\ 1 & 1+\varepsilon & 1 \\ 1 & 1 & \varepsilon\end{bmatrix} \\ & = (1+\varepsilon)\big(\varepsilon^2 + \varepsilon - 1\big) - (\varepsilon-1) - \varepsilon \\ & = \varepsilon\big(\varepsilon^2 + 2\varepsilon - 2\big) \end{aligned}
Having that the derivative
p'(\varepsilon) =3\varepsilon^2 + 4\varepsilon -2
is negative for \varepsilon approaching zero, then we can conclude that \bm{A} is not semi positive defined.
The same result can be confirmed calculating the eigenvalues of the matrix and so
p(\lambda) = \det\begin{bmatrix} 1 -\lambda & 1 & 1 \\ 1 & 1 -\lambda & 1 \\ 1 & 1 & -\lambda \end{bmatrix} = -\lambda\big(\lambda^2 - 2\lambda - 2\big)
\implies \qquad \lambda_1 = 0 \qquad \lambda_{2,3} = \frac{2\pm \sqrt{12}}{2} = 1 \pm \sqrt 3
Noting that \lambda_3 = 1-\sqrt 3 <0 is negative, than the matrix \bm{A} is for sure not semi positive defined.
This trick can be resumend in the following Lemma
Lemma 4 (Sylvester’s Trick) A symmetric matrix \bm{A} \in \mathbb{R}^{n \times n} is semi-positive definite if and only if all its leading principal minors \bm{A}_k satisfy the following conditions:
\det(\bm{A}_k) > 0, or
\det(\bm{A}_k) = 0 and the derivative of the perturbed determinant at \varepsilon = 0 is positive:
\frac{\mathrm{d}}{\mathrm{d}\varepsilon} \det(\bm{A}_k + \varepsilon \bm{I})\Big|_{\varepsilon=0} > 0