Karush-Kuhn-Tucker conditions (KKT)

Author
Affiliation

Enrico Bertolazzi

University of Trento, Department of Industrial Engineering

Constrained Minima and Lagrange Multipliers

Consider the following constrained optimization problem:

\begin{aligned} \text{Minimize:}\qquad & f(\bm{x}) \\ \text{Subject to:}\qquad & h_i(\bm{x}) = 0 \quad \text{for} \quad i = 1,2,\ldots,m \end{aligned} \tag{1}

Solution Algorithm
  1. Form the Lagrangian: Define the Lagrangian function by incorporating the constraints into the objective function:

    \mathcal{L}(\bm{x},\bm{\lambda}) = f(\bm{x}) - \sum_{k=1}^{m} \lambda_k h_k(\bm{x}) where \lambda_k are the Lagrange multipliers.

  2. Solve the KKT conditions: Find the critical points by solving the system of equations given by the gradients of the Lagrangian:

    \nabla_{\bm{x}}\mathcal{L}(\bm{x}, \bm{\lambda}) = \bm{0}^T \quad \text{and} \quad \bm{h}(\bm{x}) = \bm{0} This nonlinear system must be solved simultaneously for \bm{x} and \bm{\lambda}.

  3. Check linear independence of constraints: For each solution (\bm{x}^\star, \bm{\lambda}^\star), compute the Jacobian matrix \nabla \bm{h}(\bm{x}^\star) of the constraint functions. Verify that the rows of this matrix are linearly independent, meaning the matrix has full rank.

  4. Calculate the kernel of the constraint Jacobian: Compute the matrix \bm{K}, which forms the null space (kernel) of \nabla \bm{h}(\bm{x}^\star). This satisfies:

    \nabla \bm{h}(\bm{x}^\star) \bm{K} = \bm{0}

  5. Compute the reduced Hessian: The reduced Hessian \bm{H} is a key matrix for analyzing the second-order conditions. It is defined as:

    \bm{H} = \bm{K}^T \nabla_{\bm{x}}^2 \mathcal{L}(\bm{x}^\star, \bm{\lambda}^\star) \bm{K}

  6. Verify second-order conditions:

    • Necessary condition: \bm{H} must be semi-positive definite for \bm{x}^\star to be a candidate for a local minimum.
    • Sufficient condition: \bm{H} is positive definite, ensuring that \bm{x}^\star is a strict local minimum.

Second-Order Conditions

The sufficient condition for optimality states that if the Hessian matrix, \bm{H}, is positive definite at the point (\bm{x}^\star, \bm{\lambda}^\star), then \bm{x}^\star is a strict local minimizer of the original constrained problem.

The following theorem formalizes this sufficient condition for optimality.

Theorem 1 (of Lagrange Multipliers) Let f \in C^2(\mathbb{R}^n, \mathbb{R}) and \bm{h} \in C^2(\mathbb{R}^n, \mathbb{R}^m) with \bm{x}^\star be a local minimum of problem Equation 1, i.e.,

  • f(\bm{x}^\star) is a minimum
  • subject to \bm{h}(\bm{x}^\star) = \bm{0}.
  • and \nabla \bm{h}(\bm{x}^\star) is full rank

and define the Lagrangian

\mathcal{L}(\bm{x}^\star, \bm{\lambda}) = f(\bm{x}^\star) - \sum_{k=1}^m \lambda_k h_k(\bm{x}^\star)

then there exist scalars \lambda_k, k = 1, \dots, m, such that:

  1. first order necessary condition: \nabla_x \mathcal{L}(\bm{x}^\star, \bm{\lambda}) = \nabla f(\bm{x}^\star) - \sum_{k=1}^m \lambda_k \nabla h_k(\bm{x}^\star) = 0. \tag{A}

  2. second order condition:
    for all \bm{z} \in \mathbb{R}^n such that \nabla \bm{h}(\bm{x}^\star)\bm{z} = \bm{0} and \bm{z}\neq \bm{0}, it holds that:

    • necessary: \bm{z}^T \left( \nabla^2 f(\bm{x}^\star) - \sum_{k=1}^m \lambda_k \nabla^2 h_k(\bm{x}^\star) \right) \bm{z} \geq 0, \tag{B}

      i.e., the matrix \nabla^2_x \big( f(\bm{x}^\star) - \bm{\lambda} \cdot \bm{h}(\bm{x}^\star) \big) is positive semi-definite in the kernel of \nabla \bm{h}(\bm{x}^\star).

    • sufficient:

      \bm{z}^T \left( \nabla^2 f(\bm{x}^\star) - \sum_{k=1}^m \lambda_k \nabla^2 h_k(\bm{x}^\star) \right) \bm{z} \geq \epsilon \|\bm{z}\|^2>0, \tag{C}

      i.e., the matrix \nabla^2_x \big( f(\bm{x}^\star) - \bm{\lambda} \cdot \bm{h}(\bm{x}^\star) \big) is positive definite in the kernel of \nabla \bm{h}(\bm{x}^\star).

Let \bm{x}^\star be a local minimum. Then, there exists \epsilon > 0 such that

f(\bm{x}^\star) \leq f(\bm{x}), \quad \text{for all } \bm{x} \in \mathcal{B} \text{ with } \bm{h}(\bm{x}) = \bm{0}, \tag{2}

where

\mathcal{B} = \{ \bm{x} \mid \|\bm{x} - \bm{x}^\star\| \leq \epsilon \}. \tag{3}

We define a sequence of functions:

f_k(\bm{x}) = f(\bm{x}) + k \|\bm{h}(\bm{x})\|^2 + \alpha \|\bm{x} - \bm{x}^\star\|^2, \quad \alpha > 0, \tag{4}

with the corresponding sequence of unconstrained local minima \bm{x}_k \in \mathcal{B}, i.e.,

\bm{x}_k = \arg\min_{\bm{x} \in \mathcal{B}} f_k(\bm{x}).

Since \bm{x}_k is contained within the compact ball \mathcal{B}, a convergent subsequence \bm{x}_{k_j} \to \overline{\bm{x}} \in \mathcal{B} exists by compactness.

The rest of the proof to verify that \bar{\bm{x}}=\bm{x}^\star and it a minimum.

Step 1: \bm{h}(\bar{\bm{x}})=\bm{0}.

Notice that the sequence x_k satisfy f_k(\bm{x}_k)\leq f(\bm{x}^\star), in fact

f_k(\bm{x}_k) \leq f_k(\bm{x}^\star) = f(\bm{x}^\star)+k\|\bm{h}(\bm{x}^\star)\|^2+\alpha \|\bm{x}^\star-\bm{x}^\star\|^2 = f(\bm{x}^\star).

and by definition in Equation 4 we have

\begin{aligned} k_j\|\bm{h}(\bm{x}_{k_j})\|^2+\alpha \|\bm{x}_{k_j}-\bm{x}^\star\|^2 &\leq f(\bm{x}^\star)-f(\bm{x}_{k_j}) \\ &\leq f(\bm{x}^\star)-\min_{\bm{x}\in B}f(\bm{x}) = C < +\infty \end{aligned} \tag{5}

so that

\begin{aligned} \lim_{j\to\infty}\|\bm{h}(\bm{x}_{k_j})\|&=0 \\ &\Downarrow\\ \left\|\bm{h}\left(\lim_{j\to\infty}\bm{x}_{k_j}\right)\right\|&= \left\|\bm{h}\left(\bar{\bm{x}}\right)\right\| =0 \\ &\Downarrow\\ \bm{h}(\bar{\bm{x}})&=\bm{0}. \end{aligned}

Step 2: \bar{\bm{x}}=\bm{x}^\star.

From equation Equation 5, we have:

\begin{aligned} \alpha \|\bm{x}_{k_j} - \bm{x}^\star\|^2 & \leq f(\bm{x}^\star) - f(\bm{x}_{k_j}) - k_j \|\bm{h}(\bm{x}_{k_j})\|^2 \\ & \leq f(\bm{x}^\star) - f(\bm{x}_{k_j}). \end{aligned}

Taking the limit as j approaches infinity gives us:

\begin{aligned} \alpha \|\lim_{j \to \infty} \bm{x}_{k_j} - \bm{x}^\star\|^2 & \leq \alpha \|\bar{\bm{x}} - \bm{x}^\star\|^2 \\ & \leq f(\bm{x}^\star) - \lim_{j \to \infty} f(\bm{x}_{k_j}) \\ & \leq f(\bm{x}^\star) - f(\bar{\bm{x}}). \end{aligned}

Given that \|\bm{h}(\bar{\bm{x}})\| = 0, we can infer from equation Equation 2 that:

f(\bm{x}^\star) \leq f(\bar{\bm{x}}).

Thus, we can combine the inequalities:

\alpha \|\bar{\bm{x}} - \bm{x}^\star\|^2 \leq f(\bm{x}^\star) - f(\bar{\bm{x}}) \leq 0.

This implies that:

\bar{\bm{x}} = \bm{x}^\star.

Step 3: Build multiplier.

Since \bm{x}_{k_j} are unconstrained local minima for f_{k_j}(\bm{x}), it follows that:

\nabla f_{k_j}(\bm{x}_{k_j}) = \nabla f(\bm{x}_{k_j}) + k_j \nabla \|\bm{h}(\bm{x}_{k_j})\|^2 + \alpha \nabla \|\bm{x}_{k_j} - \bm{x}^\star\|^2 = \bm{0}.

Recalling the gradient identities:

\begin{aligned} \nabla \|\bm{x}\|^2 &= \nabla (\bm{x} \cdot \bm{x}) = 2\bm{x}^T,\\ \nabla \|\bm{h}(\bm{x})\|^2 &= \nabla (\bm{h}(\bm{x}) \cdot \bm{h}(\bm{x})) = 2\bm{h}(\bm{x})^T\nabla \bm{h}(\bm{x}), \end{aligned}

we can express the gradient condition as:

\nabla f(\bm{x}_{k_j})^T + 2k_j \nabla \bm{h}(\bm{x}_{k_j})^T \bm{h}(\bm{x}_{k_j}) + 2\alpha (\bm{x}_{k_j} - \bm{x}^\star) = \bm{0}. \tag{6}

Next, we left-multiply by \nabla \bm{h}(\bm{x}_{k_j}):

\nabla \bm{h}(\bm{x}_{k_j}) \left[\nabla f(\bm{x}_{k_j})^T + 2\alpha (\bm{x}_{k_j} - \bm{x}^\star)\right] + 2k_j \nabla \bm{h}(\bm{x}_{k_j}) \nabla \bm{h}(\bm{x}_{k_j})^T \bm{h}(\bm{x}_{k_j}) = \bm{0}.

Since \nabla \bm{h}(\bm{x}^\star) \in \mathbb{R}^{m\times n} is full rank for sufficiently large j, by continuity, \nabla \bm{h}(\bm{x}_{k_j}) is also full rank. Therefore, the term \nabla \bm{h}(\bm{x}_{k_j})\nabla \bm{h}(\bm{x}_{k_j})^T is nonsingular in \mathbb{R}^{m\times m}.

This leads us to:

2k_j \bm{h}(\bm{x}_{k_j}) = -\left(\nabla \bm{h}(\bm{x}_{k_j})\nabla \bm{h}(\bm{x}_{k_j})^T\right)^{-1} \nabla \bm{h}(\bm{x}_{k_j}) \left[\nabla f(\bm{x}_{k_j})^T + 2\alpha (\bm{x}_{k_j}-\bm{x}^\star)\right].

Taking the limit as j approaches infinity, we obtain:

\lim_{j\to\infty} 2k_j \bm{h}(\bm{x}_{k_j}) = -\left(\nabla \bm{h}(\bm{x}^\star)\nabla \bm{h}(\bm{x}^\star)^T\right)^{-1} \nabla \bm{h}(\bm{x}^\star)\nabla f(\bm{x}^\star)^T = -\bm{\lambda}. \tag{7}

Finally, substituting this limit back into equation Equation 6 along with equation Equation 7 yields:

\nabla f(\bm{x}^\star) - \nabla \bm{h}(\bm{x}^\star)\bm{\lambda} = \bm{0}. \tag{8}

Step 4: Build a special sequence of \bm{z}_j.

We need to construct a sequence \bm{z}_j \to \bm{z} such that

\nabla \bm{h}(\bm{x}_{k_j}) \bm{z}_j = \bm{0} \tag{9}

for all j. To achieve this, we define the sequence \bm{z}_j as the projection of \bm{z} onto the kernel of \nabla \bm{h}(\bm{x}_{k_j}). This is given by:

\bm{z}_j = \bm{z} - \nabla \bm{h}(\bm{x}_{k_j})^T \left[\nabla \bm{h}(\bm{x}_{k_j}) \nabla \bm{h}(\bm{x}_{k_j})^T\right]^{-1} \nabla \bm{h}(\bm{x}_{k_j}) \bm{z}.

To verify that this construction satisfies our requirement, we compute:

\begin{aligned} \nabla \bm{h}(\bm{x}_{k_j}) \bm{z}_j &= \nabla \bm{h}(\bm{x}_{k_j}) \bm{z} - \nabla \bm{h}(\bm{x}_{k_j}) \nabla \bm{h}(\bm{x}_{k_j})^T \left[\nabla \bm{h}(\bm{x}_{k_j}) \nabla \bm{h}(\bm{x}_{k_j})^T\right]^{-1} \nabla \bm{h}(\bm{x}_{k_j}) \bm{z} \\ &= \nabla \bm{h}(\bm{x}_{k_j}) \bm{z} - \nabla \bm{h}(\bm{x}_{k_j}) \bm{z} = \bm{0}. \end{aligned}

Next, we consider the limit as j approaches infinity:

\begin{aligned} \lim_{j\to\infty}\bm{z}_j &= \bm{z} - \lim_{j\to\infty} \nabla \bm{h}(\bm{x}_{k_j})^T \left[\nabla \bm{h}(\bm{x}_{k_j}) \nabla \bm{h}(\bm{x}_{k_j})^T\right]^{-1} \nabla \bm{h}(\bm{x}_{k_j})\bm{z} \\ &= \bm{z} - \nabla \bm{h}(\bm{x}^\star)^T \left[\nabla \bm{h}(\bm{x}^\star) \nabla \bm{h}(\bm{x}^\star)^T\right]^{-1} \nabla \bm{h}(\bm{x}^\star) \bm{z}. \end{aligned}

Thus, if \bm{z} lies in the kernel of \nabla \bm{h}(\bm{x}^\star), i.e., \nabla \bm{h}(\bm{x}^\star)\bm{z}= 0 then the limit \lim_{j\to\infty}\bm{z}_j = \bm{z}.

exists.

Step 5: Necessary conditions.

Since \bm{x}_{k_j} are unconstrained local minima for f_{k_j}(\bm{x}), it follows that the matrices \nabla^2 f_{k_j}(\bm{x}_{k_j}) are semi-positive definite. This means:

\bm{z}^T \nabla^2 f_{k_j}(\bm{x}_{k_j}) \bm{z} \geq 0, \quad \forall \bm{z} \in \mathbb{R}^n.

Moreover, we can express the Hessian as:

\begin{aligned} \nabla^2 f_{k_j}(\bm{x}_{k_j}) &= \nabla^2 f(\bm{x}_{k_j}) + k_j \nabla^2 \|\bm{h}(\bm{x}_{k_j})\|^2 + 2\alpha \nabla(\bm{x}_{k_j} - \bm{x}^\star) \\ &= \nabla^2 f(\bm{x}_{k_j}) + k_j \nabla^2 \sum_{i=1}^m h_i(\bm{x}_{k_j})^2 + 2\alpha \bm{I}. \end{aligned} \tag{10}

Using the identity for the Hessian of the squared function:

\nabla^2 h_i(\bm{x})^2 = \nabla(2h_i(\bm{x}) \nabla h_i(\bm{x})^T) = 2\nabla h_i(\bm{x})^T\nabla h_i(\bm{x}) + 2h_i (\bm{x})\nabla^2 h_i(\bm{x}),

we can substitute into equation Equation 10 to obtain:

\begin{aligned} \nabla^2 f_{k_j}(\bm{x}_{k_j}) &= \nabla^2 f(\bm{x}_{k_j}) + 2k_j \sum_{i=1}^m \nabla h_i(\bm{x}_{k_j})^T \nabla h_i(\bm{x}_{k_j}) \\ &\quad + 2k_j \sum_{i=1}^m h_i(\bm{x}_{k_j}) \nabla^2 h_i(\bm{x}_{k_j}) + 2\alpha \bm{I}. \end{aligned} \tag{11}

Let \bm{z} \in \mathbb{R}^n, since 0 \leq \bm{z}^T\nabla^2 f_{k_j}(\bm{x}_{k_j})\bm{z}, we can write:

\begin{aligned} 0 \leq \bm{z}^T\nabla^2 f(\bm{x}_{k_j})\bm{z} & + \sum_{i=1}^m (2k_j h_i(\bm{x}_{k_j}))\bm{z}^T\nabla^2 h_i(\bm{x}_{k_j})\bm{z} \\ & + 2k_j\|\nabla\bm{h}(\bm{x}_{k_j})\bm{z}\|^2 + 2\alpha \|\bm{z}\|^2. \end{aligned}

This inequality holds for all \bm{z} \in \mathbb{R}^n, and specifically for any \bm{z}_j of step 4 that salisfy Equation 9:

0 \leq \bm{z}_j^T\nabla^2 f(\bm{x}_{k_j})\bm{z}_j + \sum_{i=1}^m (2k_j h_i(\bm{x}_{k_j}))\bm{z}_j^T\nabla^2 h_i(\bm{x}_{k_j})\bm{z}_j + 2\alpha \|\bm{z}_j\|^2.

Taking the limit as j approaches infinity, we apply Equation 7:

0 \leq \bm{z}^T\nabla^2 f(\bm{x}^\star)\bm{z} + \sum_{i=1}^m \lambda_i\bm{z}^T\nabla^2 h_i(\bm{x}^\star)\bm{z}+ 2\alpha \|\bm{z}\|^2.

Since \alpha > 0 can be chosen arbitrarily, it follows that:

0 \leq \bm{z}^T\nabla^2 f(\bm{x}^\star)\bm{z}-\sum_{i=1}^m\lambda_i\left[\bm{z}^T\nabla^2 h_i(\bm{x}^\star)\bm{z}\right].

Step 6: Sufficient conditions.

Using Equation 8 and sufficient condition (C) for all \bm{z}\in\ker\nabla\bm{h}(\bm{x}^\star) \begin{aligned} \mathcal{L}(\bm{x^\star}+\bm{z},\bm{\lambda}) & = \mathcal{L}(\bm{x^\star},\bm{\lambda}) + \cancel{\nabla_x\mathcal{L}(\bm{x^\star},\bm{\lambda})}\bm{z} + \dfrac{1}{2}\bm{z}^T\nabla_x^2\mathcal{L}(\bm{x^\star},\bm{\lambda})\bm{z} + \mathit{o}\left(\|\bm{z}\|^2\right) \\ &= \mathcal{L}(\bm{x^\star},\bm{\lambda}) + \dfrac{1}{2}\bm{z}^T\nabla_x^2\mathcal{L}(\bm{x^\star},\bm{\lambda})\bm{z} + \mathit{o}\left(\|\bm{z}\|^2\right) \\ &\geq \mathcal{L}(\bm{x^\star},\bm{\lambda}) + \dfrac{\epsilon}{2}\|\bm{z}\|^2 + \mathit{o}\left(\|\bm{z}\|^2\right) \\ &\geq \mathcal{L}(\bm{x^\star},\bm{\lambda}) + \dfrac{\epsilon}{4}\|\bm{z}\|^2, \qquad \textrm{for small $\bm{z}\in\ker\nabla\bm{h}(\bm{x}^\star)$} \end{aligned} \tag{12}

moreover

\begin{aligned} f(\delta\bm{x}+\bm{x}^\star) &= f(\delta\bm{x}+\bm{x}^\star)- \sum_{k=1}^m \lambda_k h_k(\delta\bm{x}+\bm{x}^\star) \\ & = f(\bm{x}^\star)+ \nabla f(\bm{x}^\star)\delta\bm{x}+ \dfrac{1}{2}\delta\bm{x}^T\nabla^2 f(\bm{x}^\star)\delta\bm{x} \\ & - \sum_{k=1}^m\lambda_k\left( \cancel{h_i(\bm{x}^\star)}+ \nabla h_i(\bm{x}^\star)\delta\bm{x}+ \dfrac{1}{2}\delta\bm{x}^T\nabla^2 h_i(\bm{x}^\star) \right)+\mathit{o}(\|\delta\bm{x}\|^2) \\ & = f(\bm{x}^\star)+\cancel{\nabla\mathcal{L}(\bm{x}^\star,\bm{\lambda})}\delta\bm{x}+ \dfrac{1}{2}\delta\bm{x}^T\nabla_x^2\mathcal{L}(\bm{x}^\star,\bm{\lambda})\delta\bm{x} +\mathit{o}(\|\delta\bm{x}\|^2) \end{aligned}

if \delta\bm{x}=\bm{z}+\mathit{o}(\bm{z}) then

Using Equation 12 for all \bm{z}\in\ker\nabla\bm{h}(\bm{x}^\star)

\begin{aligned} f(\delta\bm{x}+\bm{x}^\star) & = f(\bm{x}^\star)+ \dfrac{1}{2}\delta\bm{x}^T\nabla_x^2\mathcal{L}(\bm{x}^\star,\bm{\lambda})\delta\bm{x} +\mathit{o}(\|\delta\bm{x}\|^2) \\ &\geq f(\bm{x}^\star) -\dfrac{1}{2} \left|\delta\bm{x}^T\nabla_x^2\mathcal{L}(\bm{x}^\star,\bm{\lambda})\delta\bm{x} -\bm{z}^T\nabla_x^2\mathcal{L}(\bm{z}^\star,\bm{\lambda})\bm{z} \right| \\ & +\dfrac{1}{2}\bm{z}^T\nabla_x^2\mathcal{L}(\bm{z}^\star,\bm{\lambda})\bm{z} +\mathit{o}(\|\delta\bm{x}\|^2) \\ &\geq f(\bm{x}^\star) - \dfrac{1}{2}E(\bm{z},\delta\bm{x}) +\dfrac{\epsilon}{4}\|\bm{z}\|^2 +\mathit{o}(\|\delta\bm{x}\|^2) \end{aligned} \tag{13}

where

\begin{aligned} E(\bm{z},\delta\bm{x}) & = \left| \delta\bm{x}^T \nabla_x^2\mathcal{L}(\bm{x}^\star,\bm{\lambda}) \delta\bm{x} -\bm{z}^T \nabla_x^2\mathcal{L}(\bm{z}^\star,\bm{\lambda}) \bm{z} \right| \\ & \leq \left| \delta\bm{x}^T \nabla_x^2\mathcal{L}(\bm{x}^\star,\bm{\lambda}) \delta\bm{x} -\bm{z}^T \nabla_x^2\mathcal{L}(\bm{z}^\star,\bm{\lambda}) \delta\bm{x} \right| \\ & + \left| \bm{z}^T \nabla_x^2\mathcal{L}(\bm{x}^\star,\bm{\lambda}) \delta\bm{x} -\bm{z}^T \nabla_x^2\mathcal{L}(\bm{z}^\star,\bm{\lambda}) \bm{z} \right| \\ &\leq C\|\delta\bm{x}-\bm{z}\|\cdot(\|\bm{z}\|+\|\delta\bm{x}\|) \end{aligned}

if \color{blue}\delta\bm{x}=\bm{z}+\mathit{o}(\|\bm{z}\|) then \mathit{o}(\|\delta\bm{x}\|)=\mathit{o}(\|\bm{z}\|) and

E(\bm{z},\delta\bm{x}) \leq C\,\mathit{o}(\|\bm{z}\|)\cdot(\|\bm{z}\|+\|\bm{z}+\mathit{o}(\bm{z})\|) = \mathit{o}(\|\bm{z}\|^2)

used in Equation 13 for small \delta\bm{x}

\begin{aligned} f(\delta\bm{x}+\bm{x}^\star) &\geq f(\bm{x}^\star) - \mathit{o}(\|\delta\bm{x}\|^2) +\dfrac{\epsilon}{4}\|\bm{z}\|^2 +\mathit{o}(\|\delta\bm{x}\|^2) \\ & = f(\bm{x}^\star) +\dfrac{\epsilon}{4}\|\delta\bm{x}\|^2 + \mathit{o}(\|\delta\bm{x}\|^2) \\ & \geq f(\bm{x}^\star) +\dfrac{\epsilon}{8}\|\delta\bm{x}\|^2 \qquad\textrm{for small $\delta\bm{x}$} \end{aligned}

and thus \bm{x}^\star is a strict local minimum.

To prove \color{blue}\delta\bm{x}=\bm{z}+\mathit{o}(\|\bm{z}\|)

TO BE FINISHED…

This establishes all the relation we aimed to prove. \square

Inequality constraints

It is possible to adapt Theorem 1 for inequality constraints. Consider the NLP problem

\begin{aligned} \textrm{minimize:} \quad & f(\bm{x}) \\[0.5em] \textrm{subject to:}\quad & h_i(\bm{x})=0 \quad &i&=1,2,\ldots,m \\[0.5em] & g_i(\bm{x})\geq 0\quad &i&=1,2,\ldots,p \end{aligned} \tag{14}

introducing the slack variables e_i for i=1,2,\ldots,p and the vector \bm{y}^T=(\bm{x}^T,\bm{e}^T) the new problem

\begin{aligned} \textrm{minimize:}\quad && \mathsf{f}(\bm{y}) &=f(\bm{x}) \\[0.5em] \textrm{subject to:}\quad && \mathsf{h}_i(\bm{y}) &= h_i(\bm{x})=0\quad &i&=1,2,\ldots,m \\[0.5em] && \mathsf{h}_{i+m}(\bm{y}) &= g_i(\bm{x})-e_i^2 = 0\quad &i&=1,2,\ldots,p \end{aligned} \tag{15}

Using this formulation we deduce:

Lemma 1 (first order condition) Defining the Lagrangian: \mathcal{L}(\bm{x},\bm{\lambda},\bm{\mu}) = f(\bm{x}) -\sum_{k=1}^m \lambda_k h_k(\bm{x}) -\sum_{\ell=1}^p \mu_\ell g_\ell(\bm{x}) \tag{16}

The first order condition for problem Equation 15 become

\begin{aligned} \nabla_x\mathcal{L}(\bm{x}^\star,\bm{\lambda},\bm{\mu}) &=\bm{0}^T, \\[1em] h_k(\bm{x}^\star) &= 0, \qquad & k & =1,2,\ldots m\\[1em] \mu_\ell g_\ell(\bm{x}^\star) & = 0,\quad g_\ell(\bm{x}^\star)\geq 0\quad & \ell&=1,2,\ldots p\\[1em] \end{aligned}

So that when g_\ell(\bm{x}^\star)>0 then \mu_\ell=0. Finally, the gradient of active constraints are linearly independent.

with the extended Lagrangian function:

\mathcal{L}^e(\bm{x},\bm{e},\bm{\lambda},\bm{\mu}) = f(\bm{x}) -\sum_{k=1}^m \lambda_k h_k(\bm{x}) -\sum_{\ell=1}^p \mu_\ell \left(g_\ell(\bm{x})-e_\ell^2\right) \tag{17}

The first order condition becomes

\begin{aligned} \nabla_x\mathcal{L}^e(\bm{x}^\star,\bm{e},\bm{\lambda},\bm{\mu}) &= \nabla f(\bm{x}^\star) -\sum_{k=1}^m \lambda_k\nabla h_k(\bm{x}^\star) -\sum_{k=1}^p \mu_k\nabla g_k(\bm{x}^\star)=\bm{0}^T, \\[1em] \nabla_e\mathcal{L}^e(\bm{x}^\star,\bm{e},\bm{\lambda},\bm{\mu}) &= 2(\mu_1 e_1,\ldots,\mu_p e_p)=\bm{0}^T, \\[1em] \nabla_{\lambda_k}\mathcal{L}^e(\bm{x}^\star,\bm{e},\bm{\lambda},\bm{\mu}) & = h_k(\bm{x}^\star)= 0, \\[1em] \nabla_{\mu_\ell}\mathcal{L}^e(\bm{x}^\star,\bm{e},\bm{\lambda},\bm{\mu}) & = g_\ell(\bm{x}^\star) = e_\ell^2 \geq 0, \end{aligned}

Notice that \mu_\ell e_\ell = 0 is equivalent of \mu_\ell e_\ell^2 = 0 and thus \mu_\ell g_\ell(\bm{x}^\star) = 0. Moreover \nabla_x\mathcal{L}^e(\bm{x}^\star,\bm{e},\bm{\lambda},\bm{\mu})= \nabla_x\mathcal{L}(\bm{x}^\star,\bm{\lambda},\bm{\mu}). \square

second order condition become:

Lemma 2 (second order necessary condition) Using the Lagrangian Equation 16, the second order condition for problem Equation 15 become:

\begin{aligned} \bm{z}^T\nabla^2_x\mathcal{L}(\bm{x}^\star,\bm{\lambda},\bm{\mu})\bm{z} & \geq 0, \\[0.5em] \mu_\ell & \geq 0, \qquad \ell = 1,2,\ldots,p \end{aligned}

for all \bm{z} in the kernel of the gradient of the active constraints.

Using Lagrangian Equation 17 with the slack functions the necessary condition become

\bm{z}^T \nabla^2_{(x,e)}\mathcal{L}^e(\bm{x}^\star,\bm{e},\bm{\lambda},\bm{\mu})\bm{z}\geq 0

for all \bm{z} in the kernel of matrix \begin{pmatrix} \nabla_x \bm{h}(\bm{x}^\star) & \bm{0} \\[0.5em] \nabla_x \bm{g}(\bm{x}^\star) & -2\,\textrm{diag}(e_1,\ldots,e_p) \end{pmatrix} \tag{18}

where

\nabla^2_{(x,e)}\mathcal{L}^e(\bm{x}^\star,\bm{e},\bm{\lambda},\bm{\mu}) = \begin{pmatrix} \nabla^2_x\mathcal{L}^e(\bm{x}^\star,\bm{e},\bm{\lambda},\bm{\mu}) & \bm{0} \\[0.5em] \bm{0} & \nabla^2_e\mathcal{L}(\bm{x}^\star,\bm{e},\bm{\lambda},\bm{\mu}) \end{pmatrix} \tag{19}

and \nabla_x\nabla_e^T\mathcal{L}^e(\bm{x}^\star,\bm{e},\bm{\lambda},\bm{\mu})=\bm{0}. Moreover

\begin{aligned} \nabla^2_x\mathcal{L}^e(\bm{x}^\star,\bm{e},\bm{\lambda},\bm{\mu}) &= \nabla^2 f(\bm{x}^\star) -\sum_{k=1}^m \lambda_k\nabla^2 h_k(\bm{x}^\star) -\sum_{\ell=1}^p \mu_\ell\nabla^2 g_\ell(\bm{x}^\star), %\\ %\nabla_x\nabla_e^T\mathcal{L}(\bm{x}^\star,\bm{e},\bm{\lambda},\bm{\mu}) &=& %\bm{0}, \\ \nabla^2_e\mathcal{L}^e(\bm{x}^\star,\bm{e},\bm{\lambda},\bm{\mu}) &= 2\,\textrm{diag}(\mu_1,\mu_2,\ldots,\mu_p). \end{aligned}

After reordering, we can separate the constraints as follows:

\bm{g}(\bm{x})=\begin{pmatrix} \bm{g}^{(A)}(\bm{x})\\ \bm{g}^{(I)}(\bm{x}) \end{pmatrix} \;\textrm{and}\; \left\{\begin{aligned} g_\ell(\bm{x}^\star)&=e_\ell^2=0,\; &\ell&=1,2,\ldots,r\\ g_\ell(\bm{x}^\star)&=e_\ell^2>0,\; &\ell&=r+1,r+2,\ldots,p \end{aligned}\right.

The group of constraints \bm{g}^{(A)}(\bm{x}^\star) that are zeros are the active constraints, the remaining \bm{g}^{(I)}(\bm{x}^\star) are the inacive ones. Thus, Equation 18 in more details becomes

\begin{pmatrix} \nabla \bm{h}(\bm{x}^\star) & \bm{0} & \bm{0} \\[0.5em] \nabla \bm{g}^{(A)}(\bm{x}^\star) & \bm{0} & \bm{0} \\[0.5em] \nabla \bm{g}^{(I)}(\bm{x}^\star) & \bm{0} & -\bm{E} \end{pmatrix}, \quad \bm{E}= 2\,\textrm{diag}(e_{k+1},\ldots,e_p). \tag{20}

and

\nabla^2_e\mathcal{L}^e(\bm{x}^\star,\bm{e},\bm{\lambda},\bm{\mu}) = \begin{pmatrix} \bm{M} & \bm{0} \\ \bm{0} & \bm{0} \end{pmatrix}, \quad \bm{M} = 2\,\textrm{diag}(\mu_1,\mu_2,\ldots,\mu_r) \tag{21}

The kernel of Equation 20 can be written as

\underline{\mathbf{K}} = \begin{pmatrix} \bm{K}^{a} & \bm{0} \\[0.5em] \bm{0} & \bm{I} \\[0.5em] \bm{E}^{-1}\nabla_x \bm{g}^{(I)}(\bm{x}^\star) \bm{K}^{a}& \bm{0} \end{pmatrix}, \tag{22}

where \bm{K}^{a} is the kernel of active constraints given by the matrix

\begin{pmatrix} \nabla \bm{h}(\bm{x}^\star) \\[0.5em] \nabla \bm{g}^{(A)}(\bm{x}^\star) \end{pmatrix} \quad\implies\quad \begin{pmatrix} \nabla \bm{h}(\bm{x}^\star) \\[0.5em] \nabla \bm{g}^{(A)}(\bm{x}^\star) \end{pmatrix} \bm{K}^{a}=\bm{0} \tag{23}

thus \bm{z} can be written as \bm{z}=\underline{\mathbf{K}}\bm{d} and thus second order condition \bm{z}^T \nabla^2_{(x,e)}\mathcal{L}(\bm{x}^\star,\bm{e},\bm{\lambda},\bm{\mu})\bm{z}\geq 0 become

0 \leq \bm{d}^T \left[\underline{\mathbf{K}}^T \nabla^2_{(x,e)}\mathcal{L}^e(\bm{x}^\star,\bm{e},\bm{\lambda},\bm{\mu}) \underline{\mathbf{K}}\right]\bm{d}, \qquad \bm{d}\in\Bbb{R}^s

and using Equation 22 with Equation 19 and Equation 21

\begin{aligned} &\left[\underline{\mathbf{K}}^T \nabla^2_{(x,e)}\mathcal{L}^e(\bm{x}^\star,\bm{e},\bm{\lambda},\bm{\mu}) \underline{\mathbf{K}}\right] \\ &\qquad= \underline{\mathbf{K}}^T \begin{pmatrix} \nabla^2_x\mathcal{L}^e(\bm{x}^\star,\bm{e},\bm{\lambda},\bm{\mu}) & \bm{0} & \bm{0} \\[0.5em] \bm{0} & \bm{M} & \bm{0} \\[0.5em] \bm{0} & \bm{0} & \bm{0} \end{pmatrix} \underline{\mathbf{K}}, \\[1em] &\qquad= \underline{\mathbf{K}}^T \begin{pmatrix} \nabla^2_x\mathcal{L}^e(\bm{x}^\star,\bm{e},\bm{\lambda},\bm{\mu})\bm{K}^{a} & \bm{0} \\[0.5em] \bm{0} & \bm{M} \\[0.5em] \bm{0} & \bm{0} \end{pmatrix}, \\[1em] &\qquad= \begin{pmatrix} \bm{K}^{a,T}\nabla^2_x\mathcal{L}^e(\bm{x}^\star,\bm{e},\bm{\lambda},\bm{\mu})\bm{K}^a & \bm{0} \\[0.5em] \bm{0} & \bm{M} \end{pmatrix} \end{aligned} \tag{24}

Using the equality \nabla^2_x\mathcal{L}^e(\bm{x}^\star,\bm{e},\bm{\lambda},\bm{\mu}) = \nabla^2_x\mathcal{L}(\bm{x}^\star,\bm{\lambda},\bm{\mu}) \tag{25}

the second order condition become: \nabla^2_x\mathcal{L}(\bm{x}^\star,\bm{\lambda},\bm{\mu}) must be semi-positive definite in the kernel of the gradient of the active constraints. The matrix \bm{M} must be semi-positive definite or equivalenty \mu_\ell\geq 0 for all the multipliers of the not-active constraints (remember that for the active constraints must be \mu_\ell=0). \square

Lemma 3 (second order sufficient condition) Using the Lagrangian Equation 16, the second order sufficient condition for problem Equation 15 become:

\begin{aligned} \bm{z}^T\nabla^2_x\mathcal{L}(\bm{x}^\star,\bm{\lambda},\bm{\mu})\bm{z} & \geq \epsilon \|\bm{z}\|^2, \\[0.5em] \mu_\ell & = 0, \qquad \textrm{for $g_\ell(\bm{x}^\star)>0$} \\[0.5em] \mu_\ell & > 0, \qquad \textrm{for $g_\ell(\bm{x}^\star)=0$} \end{aligned}

for all \bm{z} in the kernel of the gradient of the active constraints.

Using Equation 24 with Equation 25 the sufficient condition become that the hessian satisfy

\bm{z}^T\nabla^2_x\mathcal{L}(\bm{x}^\star,\bm{\lambda},\bm{\mu})\bm{z} \geq \epsilon \|\bm{z}\|^2 % \bm{0} & \bm{M} % \end{pmatrix} %\end{aligned}

for all \bm{z} in the kernel of the gradients of active constraints.

And matrix \bm{M} must be positive definite, or \mu_\ell>0 for all inactive constraints. \square

The last two lemma can be resumed in:

First and second order KKT condition
  • First order condition: for the NLP Equation 14

    \begin{aligned} (a) & \qquad &\nabla_x\mathcal{L}(\bm{x},\bm{\lambda},\bm{\mu}) & = \bm{0} \\ (b) & \qquad &h_k(\bm{x}) &= 0 \\ (c) & \qquad &\mu_\ell g_\ell(\bm{x}) &=0 \\[1em] (d) & \qquad &g_\ell(\bm{x}) &\geq 0 \\ (e) & \qquad &\mu_\ell & \geq 0 \end{aligned}

  • Second order condition:
    The matrix \bm{K} is the kernel of the gradient of the active constraints.

    • Necessary: the matrix \bm{K}^{a,T}\nabla^2_x\mathcal{L}(\bm{x}^\star,\bm{e},\bm{\lambda},\bm{\mu})\bm{K}^a, must be semi-positive defined.

    • Sufficient: the matrices \bm{K}^{a,T}\nabla^2_x\mathcal{L}(\bm{x}^\star,\bm{e},\bm{\lambda},\bm{\mu})\bm{K}^a, must be positive defined and \mu_\ell>0 for the active constraints, i.e., when g_\ell(\bm{x}^\star)=0.

Constrained minima, NLP problem

Consider the constrained minimization problem

\begin{aligned} \textrm{minimize:}\quad && f(\bm{x})& \\[0.5em] \textrm{subject to:}\quad && h_i(\bm{x}) &=0\quad &i&=1,2,\ldots,m \\[0.5em] && g_\ell(\bm{x}) &\geq 0\quad &\ell&=1,2,\ldots,p \end{aligned} \tag{26}

Solution Algorithm
  1. Compute the Lagrangian Function: The Lagrangian function is defined as: \mathcal{L}(\bm{x}, \bm{\lambda}, \bm{\mu}) = f(\bm{x}) - \sum_{k=1}^m \lambda_k h_k(\bm{x}) - \sum_{\ell=1}^p \mu_\ell g_\ell(\bm{x}).

  2. Solve the Nonlinear System: We need to solve the following system of equations: \begin{aligned} \nabla_x \mathcal{L}(\bm{x}, \bm{\lambda}, \bm{\mu}) &= \bm{0}, \\[0.5em] h_k(\bm{x}) &= 0, \quad & k&=1,2,\ldots,m, \\[0.5em] \mu_\ell g_\ell(\bm{x}) &= 0, \quad & \ell&=1,2,\ldots,p. \end{aligned} Ensure to keep only the solutions where \mu_k^\star \geq 0 and g_k(\bm{x}^\star) \geq 0.

  3. Check Active Constraints: For each solution point (\bm{x}^\star, \bm{\lambda}^\star, \bm{\mu}^\star):

    • Compute \nabla \bm{h}(\bm{x}^\star) and \nabla g_\ell(\bm{x}^\star) for the active constraints (g_\ell(\bm{x}^\star) = 0).
    • Verify that these gradients are linearly independent.
  4. Compute the Kernel Matrix: Construct the matrix \bm{K}^a representing the kernel of \nabla \bm{h}(\bm{x}^\star) along with the gradients \nabla g_k(\bm{x}^\star) for the active constraints where \mu_k > 0.

  5. Compute the Reduced Hessian: The reduced Hessian is calculated as: \bm{H} = \bm{K}^{a,T}\nabla_x^2\mathcal{L}(\bm{x}^\star, \bm{\lambda}^\star)\bm{K}^a

  6. Evaluate Conditions:

    • Necessary Condition: Ensure that \bm{H} is semi-positive definite.
    • Sufficient Condition: Confirm that \bm{H} is positive definite and that \mu_\ell > 0 for all active constraints.


Definition 1 (ammissibile point) A point \bm{x}^\star is admissible if

\begin{aligned} h_k(\bm{x}^\star) &= 0 \qquad & k&=1,2, \ldots ,m\\ g_\ell(\bm{x}^\star) &\geq 0 \qquad & \ell&=1,2, \ldots ,p\\ \end{aligned}


Definition 2 (Feasible region) \begin{aligned} \mathcal{F} = \Big\{ \bm{x}\in\Bbb{R}^n\;|\; & h_k(\bm{x})=0,\quad & k&=1,2,\ldots,m,\\ & g_\ell(\bm{x}) \geq 0,\quad & \ell&=1,2,\ldots,p, \Big\} \end{aligned}

is called the feasible region or set of feasible points.


Definition 3 (active constraints) The following set

\mathcal{A}(\bm{x}^\star) = \{ k\;|\; g_k(\bm{x}^\star)=0\}

is named active constraints set. This set can be split in two subsets

\begin{aligned} \mathcal{A}^+(\bm{x}^\star,\bm{\mu }^*) &= \{ k\;|\; g_k(\bm{x}^\star)=0,\quad \mu ^*_k>0\} \\ \mathcal{A}^0(\bm{x}^\star,\bm{\mu }^*) &= \{ k\;|\; g_k(\bm{x}^\star)=0,\quad \mu ^*_k=0\} \end{aligned}

\mathcal{A}^+(\bm{x}^\star,\bm{\mu }^*) are the strongly active constraints and \mathcal{A}^0(\bm{x}^\star,\bm{\mu }^*) are the weakly active constraints.

Obviously

\mathcal{A}^0(\bm{x}^\star,\bm{\mu }^*)\bigcap \mathcal{A}^+(\bm{x}^\star,\bm{\mu }^*)=\emptyset \quad\textrm{and}\quad \mathcal{A}^0(\bm{x}^\star,\bm{\mu }^*)\bigcup \mathcal{A}^+(\bm{x}^\star,\bm{\mu }^*)=\mathcal{A}(\bm{x}^\star)

Constrained minima and KKT

Here, we present classical theorems of constrained minimization, including those by Karush, Kuhn, Tucker, and others, without providing detailed proofs.

The following theorem (see John (1948)) establishes the necessary conditions for constrained minima. Note that no specific conditions on the constraints are required.

Theorem 2 (Fritz John) If the functions f(\bm{x}), g_1(\bm{x}), \ldots, g_p(\bm{x}), are differentiable, then a necessary condition that \bm{x}^\star be a local minimum to problem:

\begin{aligned} \textrm{minimize:}\qquad & f(\bm{x}) \\[0.5em] \textrm{subject to:}\qquad & g_i(\bm{x})\geq 0\qquad &i=1,2,\ldots,p \end{aligned}

is that there exist scalars \mu_0^\star, \mu_1^\star, \ldots, \mu_p^\star, (not all zero) such that the following inequalities and equalities are satisfied:

\begin{aligned} \nabla_x \mathcal{L}(\bm{x}^\star,\bm{\mu}^*) &=\bm{0}^T & \\ \mu_k^*g_k(\bm{x}^\star) &= 0,\qquad & k&=1,2,\ldots,p; \\ \mu_k^* &\geq 0,\qquad & k&=0,1,2,\ldots,p; \end{aligned}

where

\mathcal{L}(\bm{x},\bm{\mu}) = \mu_0 f(\bm{x}) -\sum_{k=1}^p \mu_k\,g_k(\bm{x})


In Kuhn and Tucker (1951) the authors showed that if a condition, called the first order constraint qualification, holds at \bm{x}^\star, \bm{\lambda}^\star then \lambda_0 can be taken equal to 1.

Definition 4 (Constraints qualification LI) Let be the unilateral and bilateral constraints \bm{g}(\bm{x}) and \bm{h}(\bm{x}), the point \bm{x}^\star is admissible if

g_k(\bm{x}^\star) \geq 0, \qquad h_k(\bm{x}^\star) = 0.

The constraints \bm{g}(\bm{x}) and \bm{h}(\bm{x}) are qualified at \bm{x}^\star if the point \bm{x}^\star is admissible and the vectors

\{\nabla g_k(\bm{x}^\star)\;:\;k\in \mathcal{A}(\bm{x}^\star)\}\; \bigcup\; \{ \nabla h_1(\bm{x}^\star),\nabla h_2(\bm{x}^\star),\ldots,\nabla h_m(\bm{x}^\star)\}

are linearly independent.

The next theorems are taken from Nocedal and Wright (2006) and McCormick (1967).

Theorem 3 (First order necessary conditions) Let f\in C^1(\Bbb{R}^n) and the constraints \bm{g}\in C^1(\Bbb{R}^n,\Bbb{R}^p) and \bm{h}\in C^1(\Bbb{R}^n,\Bbb{R}^m). Suppose that \bm{x}^\star is a local minima of Equation 26 and that the constraints qualification LI holds at \bm{x}^\star. Then there are Lagrange multiplier vectors \bm{\lambda} and \bm{\mu} such that the following conditions are satisfied at (\bm{x}^\star,\bm{\lambda},\bm{\mu}):

\begin{aligned} \nabla_x \mathcal{L}(\bm{x}^\star,\bm{\lambda},\bm{\mu}) & =\bm{0}^T & \\[0.5em] h_k(\bm{x}^\star) &=0,\qquad & k&=1,2,\ldots,m; \\[0.5em] \mu_\ell^*g_k(\bm{x}^\star) &= 0,\qquad & \ell&=1,2,\ldots,p; \\[0.5em] \mu_\ell^*&\geq 0,\qquad & \ell&=1,2,\ldots,p; \end{aligned}

where

\mathcal{L}(\bm{x},\bm{\lambda},\bm{\mu}) = f(\bm{x}) -\sum_{k=1}^m \lambda_k\,h_k(\bm{x}) -\sum_{\ell=1}^p \mu_\ell\,g_\ell(\bm{x})


Theorem 4 (Second order necessary conditions) Let f\in C^2(\Bbb{R}^n) and the constraints \bm{g}\in C^2(\Bbb{R}^n,\Bbb{R}^p) and \bm{h}\in C^2(\Bbb{R}^n,\Bbb{R}^m).

Let \bm{x}^\star satisfying the first order necessary conditions, a necessary condition for \bm{x}^\star be a local minima is that the m+p scalars (Lagrange Multiplier) of the first order necessary condition satisfy:

\bm{d}^T \nabla_x^2 \mathcal{L}(\bm{x}^\star,\bm{\lambda}^*,\bm{\mu}^*) \bm{d} \color{blue}\geq 0

for all \bm{d} such that:

\nabla h_k(\bm{x}^\star)\bm{d} =0, \qquad \textrm{for}\quad k=1,2,\ldots,m

and for all active inequalities \ell\in \mathcal{A}(\bm{x}^\star):

easy condition

\color{blue}\nabla g_\ell(\bm{x}^\star)\bm{d} \color{blue}=0

more accurate condition

\begin{aligned} \color{DarkGreen}\nabla g_\ell(\bm{x}^\star)\bm{d}&=0 \quad\textrm{and}\quad\mu_\ell>0\\[1em] \color{DarkGreen}\nabla g_\ell(\bm{x}^\star)\bm{d}& \geq 0 \quad\textrm{and}\quad\mu_\ell=0 \end{aligned}


Remark 1. The conditions

\begin{aligned} \color{DarkGreen} \nabla g_\ell(\bm{x}^\star)\bm{d}=0, \qquad & \color{DarkGreen} \textrm{if $\ell\in \mathcal{A}(\bm{x}^\star)$ and $\mu_\ell>0$} \\[1em] \color{DarkGreen} \nabla g_\ell(\bm{x}^\star)\bm{d}\geq 0, \qquad & \color{DarkGreen} \textrm{if $\ell\in \mathcal{A}(\bm{x}^\star)$ and $\mu_\ell=0$} \end{aligned}

restrict the space of direction to be considered. If changed with

{\color{blue}\nabla g_\ell(\bm{x}^\star)\bm{d}=0,}\qquad {\color{blue}\textrm{if $\ell\in \mathcal{A}(\bm{x}^\star)$}}

Theorem 4 is still valid cause necessary condition is tested in a smaller set.

Theorem 5 (Second order sufficient conditions) Let f\in C^2(\Bbb{R}^n) and the constraints \bm{g}\in C^2(\Bbb{R}^n,\Bbb{R}^p) and \bm{h}\in C^2(\Bbb{R}^n,\Bbb{R}^m).

Let \bm{x}^\star satisfying the First order necessary conditions, a sufficient condition for \bm{x}^\star be a local minima is that the m+p scalars (Lagrange Multiplier) of the first order necessary condition satisfy:

\bm{d}^T \nabla_x^2 \mathcal{L}(\bm{x}^\star,\bm{\lambda}^*,\bm{\mu}^*) \bm{d}\color{blue}\;\;>\;\; 0

for all \bm{d}\neq\bm{0} such that

\nabla h_k(\bm{x}^\star)\bm{d} =0, \qquad \textrm{for}\quad k=1,2,\ldots,m

and for all active inequalities \ell\in \mathcal{A}(\bm{x}^\star):

easy condition

\color{blue}\nabla g_\ell(\bm{x}^\star)\bm{d}=0, \quad\textrm{and}\quad \mu_\ell>0

more accurate condition

\begin{aligned} \color{blue}\nabla g_\ell(\bm{x}^\star)\bm{d}=0,\quad & \color{blue}\textrm{and}\quad \mu_k>0 \\ \color{Magenta}\nabla g_k\ell(\bm{x}^\star)\bm{d}\geq 0, \quad & \color{Magenta}\textrm{and} \quad \mu_\ell=0 \end{aligned}

Remark 2. The condition

{\color{Magenta}\nabla g_k(\bm{x}^\star)\bm{d}\geq 0,\qquad \textrm{if $k\in \mathcal{A}(\bm{x}^\star)$ and $\mu_k=0$}}

restrict the space of direction to be considered. If omitted the Theorem 5 is still valid cause sufficient condition is tested in a larger set.

Other constraints qualifications

In the study of optimality condition the constraints and its gradients cannot be arbitrary. They must satisfy additional analytic/geometric properties. This properties are named constraints qualification. The easiest qualification (but also compelling) is linear independence (LI)

Definition 5 (Constraints qualification LI) Given the inequality constraints \bm{g}(\bm{x}) and equality constraints \bm{h}(\bm{x}), we will say than an admissible point \bm{x}^\star is qualified if the vectors

\{\nabla g_k(\bm{x}^\star)\;:\;k\in \mathcal{A}(\bm{x}^\star)\} \;\bigcup\; \{ \nabla h_1(\bm{x}^\star),\nabla h_2(\bm{x}^\star),\ldots ,\nabla h_m(\bm{x}^\star)\}

are linearly independent.

Mangasarian-Fromovitz qualification

This qualification is less stringent of the previous

Definition 6 (Mangasarian-Fromovitz qualification) Given the inequality constraints \bm{g}(\bm{x}) and equality constraints \bm{h}(\bm{x}), we will say than an admissible point \bm{x}^\star is qualified if does not exists a linear combination

\sum_{\ell\in \mathcal{A}(\bm{x}^\star)}^m \alpha_\ell \nabla g_\ell(\bm{x}^\star)+ \sum_{k=1}^m \beta _k \nabla h_k(\bm{x}^\star) = \bm{0}

with \alpha_\ell\geq 0 for \ell\in \mathcal{A}(\bm{x}^\star) and \alpha_\ell and \beta _k not all zero. That is, there is no non trivial linear combination for the null vector with \alpha _\ell\geq 0 for \ell\in \mathcal{A}(\bm{x}^\star).

Garth P. McCormick qualification

Definition 7 (Garth P. McCormick constraints qualification (1 order)) Given an admissible point \bm{x}^\star the constraints are first order qualified if for all direction \bm{d} that satisfy

\begin{aligned} \nabla h_k(\bm{x}^\star)\bm{d} &=0,\quad k\in \{1,2, \ldots ,m\}, \\ \nabla g_\ell(\bm{x}^\star)\bm{d} &\geq 0,\quad \ell\in\mathcal{A}(\bm{x}^\star), \end{aligned}

exists a curve \bm{x}\in \mathtt{C}^1(\Bbb{R},\Bbb{R}^n) and an \varepsilon>0 such that for 0<t<\varepsilon.

\begin{aligned} \bm{x}(0)&=\bm{x}^\star,\quad & h_k(\bm{x}(t))&=0, \quad k\in \{1,2, \ldots ,m\}, \\ \bm{x}'(0) &=\bm{d}, \quad & g_\ell(\bm{x}(t))&\geq 0, \quad \ell\in \{1,2, \ldots ,p\}. \end{aligned}


Definition 8 (Garth P. McCormick constraints qualification (2 order)) Given an admissible point \bm{x}^\star the constraints are second order qualified if for all direction \bm{d} that satisfy

\begin{aligned} \nabla h_k(\bm{x}^\star)\bm{d} &=0,\qquad k\in \{1,2, \ldots ,m\}, \\ \nabla g_\ell(\bm{x}^\star)\bm{d}&=0,\qquad \ell\in \mathcal{A}(\bm{x}^\star), \end{aligned}

exists a curve \bm{x}\in \mathtt{C}^2(\Bbb{R},\Bbb{R}^n) and an \varepsilon>0 such that for 0<t<\varepsilon.

\begin{aligned} \bm{x}(0) &=\bm{x}^\star, \quad & h_k(\bm{x}(t))&= 0,\quad & k &\in \{1,2, \ldots ,m\}, \\ \bm{x}'(0) &=\bm{d}, \quad & g_\ell(\bm{x}(t))&=0,\quad & \ell & \in \mathcal{A}(\bm{x}^\star). \end{aligned}

Sylvester’s Criterion for SPD Matrices

Theorem 6 (Sylvester’s Criterion) A symmetric matrix \bm{A} \in \Bbb{R}^{n \times n} is positive definite if and only if all its leading principal minors are positive. That is, \bm{A} is positive definite if and only if: \det(\bm{A}_k) > 0 \quad \text{for} \quad k = 1, 2, \ldots, n where \bm{A}_k is the upper-left k \times k submatrix (leading principal minor) of \bm{A}.

(1) Necessity (if \bm{A} is positive definite, then all leading principal minors are positive):

Assume \bm{A} is a symmetric positive definite matrix. This means that for any nonzero vector \bm{x} \in \Bbb{R}^n, we have: \bm{x}^\top \bm{A} \bm{x} > 0

Consider a sequence of nested subspaces corresponding to the leading principal minors of \bm{A}. Let \bm{A}_k denote the upper-left k \times k submatrix of \bm{A}. For each k, \bm{A}_k must be symmetric and positive definite because any submatrix corresponding to a smaller subspace inherits positive definiteness.

To verify that each leading principal minor is positive, consider \bm{x}_k = (x_1, x_2, \dots, x_k)^\top \in \Bbb{R}^k. For \bm{A}_k, we have: \bm{x}_k^\top \bm{A}_k \bm{x}_k > 0 \quad \forall \bm{x}_k \in \Bbb{R}^k, \bm{x}_k \neq 0 This implies that the determinant of each \bm{A}_k, which represents the product of the eigenvalues of \bm{A}_k, must be positive. Hence, \det(\bm{A}_k) > 0 for all k = 1, 2, \ldots, n.

(2) Sufficiency (if all leading principal minors are positive, then \bm{A} is positive definite):
  • Base case: For n = 1, \bm{A}_1 = [a_{11}]. If \det(\bm{A}_1) > 0, then a_{11} > 0, and \bm{A}_1 is positive definite.

  • Induction Step:

    Now, assume that all leading principal minors of \bm{A} are positive: \det(\bm{A}_k) > 0 \quad \text{for} \quad k = 1, 2, \ldots, n

    We aim to show that \bm{A} is positive definite. Let us use induction on the dimension n.

    • Block Matrix Decomposition:

      We decompose the matrix \bm{A}_k into block form. The matrix \bm{A}_k can be written as: \bm{A}_k = \begin{bmatrix} \bm{A}_{k-1} & \bm{b} \\ \bm{b}^\top & c \end{bmatrix} where:

      • \bm{A}_{k-1} is the upper-left (k-1) \times (k-1) submatrix,
      • \bm{b} \in \Bbb{R}^{k-1} is a column vector, and
      • c \in \Bbb{R} is a scalar, corresponding to the k-th diagonal element of \bm{A}_k.
    • Factorize \bm{A} \begin{bmatrix} \bm{A}_{k-1} & \bm{b} \\ \bm{b}^\top & c \end{bmatrix} = \begin{bmatrix} \bm{I} & \bm{0} \\ \bm{b}^\top\bm{A}_{k-1}^{-\top} & 1 \end{bmatrix} \begin{bmatrix} \bm{A}_{k-1} & \bm{0} \\ \bm{0}^\top & d \end{bmatrix} \begin{bmatrix} \bm{I} & \bm{A}_{k-1}^{-1}\bm{b} \\ \bm{0}^\top & 1 \end{bmatrix} \tag{27} where d=c-\bm{b}^\top\bm{A}_{k-1}\bm{b}

    • Positivity of d:

      From \det(\bm{A}_k)>0 and \det(\bm{A}_{k-1})>0 using the product of determinant, from Equation 27 \det(\bm{A}_{k-1}) d = \det(\bm{A}_k) > 0,\quad \implies\quad d > 0

    • SPD of \bm{A}_k:

      From Equation 27 for \bm{x}\neq \bm{0}

      \begin{aligned} \bm{x}^\top\bm{A}_k\bm{x} &= \begin{bmatrix} \bm{y} & z \end{bmatrix} \begin{bmatrix} \bm{A}_{k-1} & \bm{b} \\ \bm{b}^\top & c \end{bmatrix} \begin{bmatrix} \bm{y} \\ z \end{bmatrix} \\ &= \begin{bmatrix} \bm{y}^\top & z \end{bmatrix} \begin{bmatrix} \bm{I} & \bm{0} \\ \bm{b}^\top\bm{A}_{k-1}^{-\top} & 1 \end{bmatrix} \begin{bmatrix} \bm{A}_{k-1} & \bm{0} \\ \bm{0}^\top & d \end{bmatrix} \begin{bmatrix} \bm{I} & \bm{A}_{k-1}^{-1}\bm{b} \\ \bm{0}^\top & 1 \end{bmatrix} \begin{bmatrix} \bm{y} \\ z \end{bmatrix} \\ &= \begin{bmatrix} \bm{y}^\top+z\bm{b}^\top\bm{A}_{k-1}^{-\top} & z \end{bmatrix} \begin{bmatrix} \bm{A}_{k-1} & \bm{0} \\ \bm{0}^\top & d \end{bmatrix} \begin{bmatrix} \bm{y}+z\bm{A}_{k-1}^{-1}\bm{b} \\ z \end{bmatrix} \\ &= \begin{bmatrix} \bm{w}^\top & z \end{bmatrix} \begin{bmatrix} \bm{A}_{k-1} & \bm{0} \\ \bm{0}^\top & d \end{bmatrix} \begin{bmatrix} \bm{w} \\ z \end{bmatrix} \\ & = \bm{w}^\top\bm{A}_{k-1}\bm{w}+d\,z^2 > 0 \end{aligned}

      where \bm{w}=\bm{y}+z\bm{A}_{k-1}^{-1}\bm{b} and [\bm{w}^\top,z]\neq\bm{0}^\top.

this conlude the proof. \square

Trick for semi positive matrices

The Sylvester theorem gives no hint to determine if a matrix is semi-positive defined when a minor is equal to zero. However a way to determine if the matrix is semi-positive defined is by considering that the matrix

\bm{A} +\varepsilon \bm{I}

should be positive defined for values of \varepsilon approaching zero from the positive direction (for \varepsilon\rightarrow 0^+).

Example 1 (Trick for usage of the Sylvester theorem on semi-positive matrrices) Let’s consider now the matrix

\bm{A} = \begin{bmatrix} 1 & 2 \\ 2 & 4 \end{bmatrix}

The first minor \bm{A}_1 = \det[1] = 1 is positive, and so we have to compute the second principal minor noting that it’s equal to zero:

\bm{A}_2 = \det\begin{bmatrix} 1 & 2 \\ 2 & 4 \end{bmatrix} = 4 - 2\cdot 2 = 0

This relation gives no clue on determining if \bm{A} is positive defined or not, but considering the trick yet defined we can compute

\det\begin{bmatrix} 1+\varepsilon & 2 \\ 2 & 4 + \varepsilon \end{bmatrix} = (1+ \varepsilon)(4+\varepsilon) - 4 = 5\varepsilon

We can see that this expression, for \varepsilon \rightarrow 0^+, is strictly greater then zero and this might conclude our analyses stating that \bm{A} is semi positive defined.

The same result can be defined by computing the eigenvalues of \bm{A} and so calculating the roots of the polynomial

\begin{aligned} p(\lambda) &= \det\begin{bmatrix} 1-\lambda & 2 \\ 2 & 4-\lambda \end{bmatrix} \\ & = (1-\lambda)(4-\lambda) - 4 \\ & = \lambda^2-5\lambda \qquad \implies \quad \lambda_1=0,\; \lambda_2 = 5 \end{aligned}

Having one eigenvalues zero (and the other positive) determines that \bm{A} is semi positive defined.


Example 2 (Counter example of the Sylvester theorem) Let’s consider now the matrix

\bm{A} = \begin{bmatrix} 1 & 1 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 0 \end{bmatrix}

The first minor \bm{A}_1 = \det[1] = 1 is positive, while the second one is zero, in fact

\bm{A}_2 = \det\begin{bmatrix} 1 & 1 \\ 1 & 1 \end{bmatrix} = 0

This result gives no information but we can use the trick by considering the determinant

\det\begin{bmatrix} 1 + \varepsilon & 1 \\ 1 & 1 + \varepsilon \end{bmatrix} = (1+\varepsilon)^2 - 1 = \varepsilon^2 +2\varepsilon > 0

for \varepsilon > 0.

Being \bm{A} a 3\times3 matrix we also need to compute the third minor that leads to another zero:

\begin{aligned} \bm{A}_3 &= \det \begin{bmatrix} 1 & 1 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 0 \end{bmatrix} \\ & = 1 \det\begin{bmatrix} 1&1\\1&1 \end{bmatrix} - 1 \det\begin{bmatrix}1&1\\1&1\end{bmatrix} + 0 \det\begin{bmatrix}1&1\\1&1\end{bmatrix} \\ & = 0 \end{aligned}

We can use another time the Sylvester theorem trick considering the determinant

\begin{aligned} p(\varepsilon) & = \det\begin{bmatrix} 1+\varepsilon & 1 & 1 \\ 1 & 1+\varepsilon & 1 \\ 1 & 1 & \varepsilon\end{bmatrix} \\ & = (1+\varepsilon)\big(\varepsilon^2 + \varepsilon - 1\big) - (\varepsilon-1) - \varepsilon \\ & = \varepsilon\big(\varepsilon^2 + 2\varepsilon - 2\big) \end{aligned}

Having that the derivative

p'(\varepsilon) =3\varepsilon^2 + 4\varepsilon -2

is negative for \varepsilon approaching zero, then we can conclude that \bm{A} is not semi positive defined.

The same result can be confirmed calculating the eigenvalues of the matrix and so

p(\lambda) = \det\begin{bmatrix} 1 -\lambda & 1 & 1 \\ 1 & 1 -\lambda & 1 \\ 1 & 1 & -\lambda \end{bmatrix} = -\lambda\big(\lambda^2 - 2\lambda - 2\big)

\implies \qquad \lambda_1 = 0 \qquad \lambda_{2,3} = \frac{2\pm \sqrt{12}}{2} = 1 \pm \sqrt 3

Noting that \lambda_3 = 1-\sqrt 3 <0 is negative, than the matrix \bm{A} is for sure not semi positive defined.


This trick can be resumend in the following Lemma

Lemma 4 (Sylvester’s Trick) A symmetric matrix \bm{A} \in \mathbb{R}^{n \times n} is semi-positive definite if and only if all its leading principal minors \bm{A}_k satisfy the following conditions:

  • \det(\bm{A}_k) > 0, or

  • \det(\bm{A}_k) = 0 and the derivative of the perturbed determinant at \varepsilon = 0 is positive:

    \frac{\mathrm{d}}{\mathrm{d}\varepsilon} \det(\bm{A}_k + \varepsilon \bm{I})\Big|_{\varepsilon=0} > 0

References

Bertsekas, Dimitri P. 2016. Nonlinear Programming. 3rd ed. Athena Scientific.
Boyd, S. P., and L. Vandenberghe. 2004. Convex Optimization. Berichte über Verteilte Messysteme, pt. 1. Cambridge University Press.
Fletcher, Roger. 2000. Practical Methods of Optimization. 2nd ed. John Wiley & Sons.
John, F. 1948. Extremum Problems with Inequalities as Subsidiary Conditions.” In Studies and Essays: Courant Anniversary Volume, edited by K. O. Friedrichs, O. E. Neugebauer, and J. J. Stoker, 187–204. New York: Wiley-Interscience.
Kuhn, H. W., and A. W. Tucker. 1951. “Nonlinear Programming.” In Proceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability, edited by J. Neyman, 481–92. University of California Press, Berkeley, California.
McCormick, G. 1967. “Second Order Conditions for Constrained Minima.” SIAM Journal on Applied Mathematics 15 (3): 641–52. https://doi.org/10.1137/0115056.
Nocedal, Jorge, and Stephen J. Wright. 2006. Numerical Optimization. 2nd ed. Springer Science & Business Media. https://doi.org/10.1007/978-0-387-40065-5.