Constrained Minimization
Calculus Review
When minimizing a function, certain regularity conditions must be assumed to simplify the analysis and prove meaningful results. The first condition is a minimal regularity requirement for the function being studied.
Definition 1 (first order regularity) Let f: \mathbb{R}^n \to \mathbb{R} be a multivariate function satisfying the following properties:
f \in C^1(\mathbb{R}^n), meaning f(\bm{x}) is continuously differentiable, and the gradient (row) vector \nabla f(\bm{x}) exists.
The gradient \nabla f(\bm{x}) is Lipschitz continuous, i.e.,
\|\nabla f(\bm{x}) - \nabla f(\bm{y})\| \leq \gamma \|\bm{x} - \bm{y}\|, \qquad \forall \bm{x}, \bm{y} \in \mathbb{R}^n
With these regularity assumptions, the Taylor series for f can be expanded up to the first order as follows:
f(\bm{x} + t\bm{d}) = f(\bm{x}) + t\nabla f(\bm{x})\bm{d} + \mathcal{o}(t)
where the remainder term \mathcal{o}(t) satisfies
\lim_{t \to 0} \frac{\mathcal{o}(t)}{t} = 0
This expansion does not require the existence of second-order derivatives, but it does require that the gradient is Lipschitz continuous.
Higher Regularity Conditions
In the classification of minimum or maximum points, additional regularity conditions are needed.
Definition 2 (second order regularity) Consider again a multivariate function f: \mathbb{R}^n \to \mathbb{R}, but now assume the following:
f \in C^2(\mathbb{R}^n), meaning f(\bm{x}) is twice continuously differentiable. Both the gradient vector \nabla f(\bm{x}) and the Hessian matrix \nabla^2 f(\bm{x}) exist.
The Hessian \nabla^2 f(\bm{x}) is Lipschitz continuous, i.e.,
\|\nabla^2 f(\bm{x}) - \nabla^2 f(\bm{y})\| \leq \gamma \|\bm{x} - \bm{y}\|, \qquad \forall \bm{x}, \bm{y} \in \mathbb{R}^n
With this stronger regularity assumption, the Taylor series can be expanded up to the second order as:
f(\bm{x} + t\bm{d}) = f(\bm{x}) + t\nabla f(\bm{x})\bm{d} + \frac{t^2}{2} \bm{d}^T \nabla^2 f(\bm{x}) \bm{d} + \mathcal{o}(t)
Here, the remainder term \mathcal{o}(t) does not require the existence of third-order derivatives but does depend on the Lipschitz continuity of the Hessian.
In summary, the first-order expansion requires f \in C^1 and the Lipschitz continuity of the gradient, while second-order expansions demand f \in C^2 and the Lipschitz continuity of the Hessian matrix. These conditions provide the necessary regularity to perform effective analyses of function behavior.
A point \bm{x}^\star \in \Bbb{R}^n is considered a global minimum if
f(\bm{x}^\star) \leq f(\bm{x}) \quad \text{for all } \bm{x} \in \Bbb{R}^n.
Local Minimum
Definition of Local Minimum
A point \bm{x}^\star \in \mathbb{R}^n is classified as a local minimum of a function f: \mathbb{R}^n \to \mathbb{R} if:
f(\bm{x}^\star) \leq f(\bm{x}) \quad \text{for all } \bm{x} \in B(\bm{x}^\star, \delta),
where B(\bm{x}^\star, \delta) represents the open ball centered at \bm{x}^\star with radius \delta > 0. This implies that within a small neighborhood around \bm{x}^\star, the value of the function f(\bm{x}) is always greater than or equal to f(\bm{x}^\star).
Strict Local Minimum
If the inequality is strict, meaning:
f(\bm{x}^\star) < f(\bm{x}) \quad \text{for all } \bm{x} \in B(\bm{x}^\star, \delta) \text{ and } \bm{x} \neq \bm{x}^\star,
then \bm{x}^\star is classified as a strict local minimum. In this case, no other point in the neighborhood has a function value equal to or less than f(\bm{x}^\star).
Conditions for a Local Minimum
First-Order Conditions
For first order condition we assume first order regularity (Definition 1). A necessary condition for \bm{x}^\star \in \mathbb{R}^n to be a local minimum is that the gradient of f at \bm{x}^\star must vanish. Mathematically, this is expressed as:
\nabla f(\bm{x}^\star)^T = 0
This condition is known as the first-order necessary condition. However, it is not sufficient to confirm that \bm{x}^\star is a minimum. This condition alone could indicate a minimum, maximum, or even a saddle point (a point that is neither a minimum nor a maximum).
Second-Order Conditions
To further classify \bm{x}^\star, we need to examine the second-order derivatives of the function and assume seond order regularity (Definition 2).
Necessary Conditions
Assuming that f \in C^2(\mathbb{R}^n), with Lipshitz continuous second derivatives, then the Hessian matrix at \bm{x}^\star must be semi-positive definite, which means: \bm{d}^T \nabla^2 f(\bm{x}^\star) \bm{d} \geq 0 \quad \text{for all} \, \bm{d} \in \mathbb{R}^n
Sufficient Conditions
A sufficient condition for \bm{x}^\star to be a strict local minimum is that the Hessian matrix \nabla^2 f(\bm{x}^\star) is positive definite. That is: \bm{d}^T \nabla^2 f(\bm{x}^\star) \bm{d} > 0 \quad \text{for all} \, \bm{d} \neq 0 \in \mathbb{R}^n
If the Hessian is positive definite, then \bm{x}^\star is guaranteed to be a strict local minimum. This condition is equivalent to saying that all the eigenvalues of the Hessian matrix \nabla^2 f(\bm{x}^\star) are positive.
In summary:
- A point where the gradient vanishes may be a local minimum, maximum, or saddle point.
- To classify a point as a (strict) local minimum, the Hessian must be positive definite.
- If the Hessian matrix is not semi-positive definite the point is surely not a minimum point.
There is a grey zone when the point satisfy necessary condition:
- the Hessian matrix is semi-positive definite
but not the sufficient condition, i.e:
- the Hessian matrix is not positive definite
in this case the point can be a mininimal point or not a require further investigation.
Lagrange Multipliers
In constrained optimization, the goal is to minimize a function f \in C^2(\mathbb{R}^n) (twice continuously differentiable) subject to m equality constraints defined by h_k \in C^2(\mathbb{R}^n). This problem can be formulated as:
\begin{aligned} \text{Minimize:} \qquad & f(\bm{x}) \\ \text{Subject to:} \qquad & h_k(\bm{x}) = 0, \quad k = 1, \dots, m \end{aligned}
This means that we want to find the point \bm{x} that minimizes the objective function f(\bm{x}) while also satisfying the m constraints given by h_k(\bm{x}) = 0 for each k.
The challenging analytical problem of constrained minimization can be simplified using the Lagrange multiplier theorem. The idea is to incorporate the constraints directly into the optimization problem through additional variables called Lagrange multipliers.
Let f be the function to be minimized, and let \bm{h} represent the constraint functions (so that f, \bm{h} \in C^2(\mathbb{R}^n)). If \bm{x}^\star is a local minimum of f that satisfies all the constraints (i.e., \bm{h}(\bm{x}^\star) = \bm{0}), then if \nabla \bm{h}(\bm{x}^\star) has maximum rank, there exist m scalars \lambda_1, \dots, \lambda_m such that:
\nabla f(\bm{x}^\star) = \sum_{k=1}^{m} \lambda_k \nabla h_k(\bm{x}^\star)
This equation indicates that at the optimal point \bm{x}^\star, the gradient of the objective function f is a linear combination of the gradients of the constraint functions h_k. The scalars \lambda_k are called Lagrange multipliers.
Lagrangian Function
The problem is transformed into a simpler form using the Lagrangian function \mathcal{L}, which incorporates both the objective function and the constraints:
\mathcal{L}(\bm{x}, \bm{\lambda}) = f(\bm{x}) - \sum_{k=1}^{m} \lambda_k h_k(\bm{x})
Here, \bm{\lambda} = (\lambda_1, \dots, \lambda_m) is the vector of Lagrange multipliers. The goal now is to find stationary points of the Lagrangian \mathcal{L} by solving a system of equations derived from the first-order conditions.
Solving the Constrained Minimization Problem
To solve the problem, we need to find the points \bm{x}^\star that satisfy the Lagrange multiplier conditions. This involves solving the following system of equations:
\begin{cases} \nabla_{\bm{x}} \mathcal{L}(\bm{x}, \bm{\lambda}) = \nabla f(\bm{x}) - \sum_{k=1}^{m} \lambda_k \nabla h_k(\bm{x}) = \bm{0} \\ \nabla_{\bm{\lambda}} \mathcal{L}(\bm{x}, \bm{\lambda}) = -\bm{h}(\bm{x}) = \bm{0} \end{cases}
The first equation corresponds to finding points where the gradient of the Lagrangian \mathcal{L} with respect to \bm{x} is zero. The second equation ensures that the constraints h_k(\bm{x}) = 0 are satisfied.
All points \bm{x}^\star that solve this system are candidates for being local minima, maxima, or saddle points. Since we are searching for stationary points of the Lagrangian, further analysis is required to determine whether these points are indeed minima or maxima.
Second-Order Conditions
To determine whether a candidate point \bm{x}^\star is a local minimum in a constrained optimization problem, we must examine the second-order conditions. These conditions involve the Hessian matrix of the Lagrangian function. The second-order condition checks whether this Hessian is positive definite when restricted to the null space of the Jacobian of the constraints, \nabla \bm{h}(\bm{x}). This is necessary to ensure that the second derivative of the objective function along feasible directions is positive, which would confirm the existence of a local minimum.
Sufficient Condition
Mathematically, the second-order condition is expressed as:
\bm{z}^T \nabla_{\bm{x}}^2 \mathcal{L}(\bm{x}^\star, \bm{\lambda}) \bm{z} > 0 \quad \text{for all} \, \bm{z} \in \ker\{\nabla \bm{h}(\bm{x}^\star)\}
Here: - \bm{z} is any vector in the null space (or kernel) of the constraint Jacobian, \ker\{\nabla \bm{h}(\bm{x}^\star)\}. - The condition means that in the directions allowed by the constraints, the Hessian of the Lagrangian should be positive.
This condition ensures that the Lagrangian’s second derivative is positive in the feasible directions, confirming that the point \bm{x}^\star is a local minimum rather than a maximum or saddle point.
Necessary Condition
For \bm{x}^\star to be a local minimum, a necessary second-order condition is that the Hessian \nabla_{\bm{x}}^2 \mathcal{L}(\bm{x}^\star, \bm{\lambda}) is semi-positive definite in the feasible directions, which implies:
\bm{z}^T \nabla_{\bm{x}}^2 \mathcal{L}(\bm{x}^\star, \bm{\lambda}) \bm{z} \geq 0 \quad \forall \, \bm{z} \in \ker\{\nabla \bm{h}(\bm{x}^\star)\}
In other words, the Hessian of the Lagrangian must not have negative curvature along any direction that lies in the kernel of the constraint gradients. This is a necessary condition but not sufficient to confirm a local minimum.
Sufficient Condition
To sufficiently confirm that \bm{x}^\star is a strict local minimum, the Hessian \nabla_{\bm{x}}^2 \mathcal{L}(\bm{x}^\star, \bm{\lambda}) must be positive definite in the null space of \nabla \bm{h}(\bm{x}), meaning:
\bm{z}^T \nabla_{\bm{x}}^2 \mathcal{L}(\bm{x}^\star, \bm{\lambda}) \bm{z} > 0 \quad \forall \, \bm{z} \in \ker\{\nabla \bm{h}(\bm{x}^\star)\}
This condition guarantees that the point \bm{x}^\star is indeed a strict local minimum, as the Lagrangian’s second derivative is strictly positive in all directions allowed by the constraints.
Summary: First and Second-Order Conditions
In constrained optimization, we want to determine if a candidate point \bm{x}^\star is a local minimum for the function f(\bm{x}) subject to equality constraints h_k(\bm{x}) = 0, where k = 1, \dots, m. To do this, we check both first-order and second-order conditions using the Lagrangian:
\mathcal{L}(\bm{x}, \bm{\lambda}) = f(\bm{x}) - \sum_{k=1}^{m} \lambda_k h_k(\bm{x})
1. First-Order Condition
The first-order necessary condition ensures that the gradient of the Lagrangian with respect to \bm{x} and the gradients of the constraints are aligned. This condition guarantees that \bm{x}^\star is a stationary point of the Lagrangian. Mathematically, this condition is:
\nabla_{\bm{x}} \mathcal{L}(\bm{x}^\star, \bm{\lambda}) = \nabla f(\bm{x}^\star) - \sum_{k=1}^{m} \lambda_k \nabla h_k(\bm{x}^\star) = \bm{0}
This means that the gradient of the objective function at \bm{x}^\star must lie in the span of the gradients of the constraints h_k(\bm{x}). This condition is necessary but not sufficient to confirm that \bm{x}^\star is a minimum.
2. Second-Order Conditions
To determine whether \bm{x}^\star is a local minimum, we need to examine the curvature of the Lagrangian through the second-order conditions. These conditions involve the Hessian of the Lagrangian, \nabla_{\bm{x}}^2 \mathcal{L}(\bm{x}, \bm{\lambda}), which describes how the function behaves near \bm{x}^\star.
Necessary Condition: The Hessian \nabla_{\bm{x}}^2 \mathcal{L}(\bm{x}^\star, \bm{\lambda}) must be semi-positive definite on the null space of the constraint Jacobian, \nabla \bm{h}(\bm{x}). Mathematically, for all \bm{z} \in \ker\{\nabla \bm{h}(\bm{x}^\star)\}:
\bm{z}^T \nabla_{\bm{x}}^2 \mathcal{L}(\bm{x}^\star, \bm{\lambda}) \bm{z} \geq 0
This condition is necessary to rule out saddle points but is not sufficient to confirm a minimum.
Sufficient Condition: To confirm that \bm{x}^\star is a strict local minimum, the Hessian \nabla_{\bm{x}}^2 \mathcal{L}(\bm{x}^\star, \bm{\lambda}) must be positive definite on the null space of the constraint gradients. That is, for all \bm{z} \in \ker\{\nabla \bm{h}(\bm{x}^\star)\}:
\bm{z}^T \nabla_{\bm{x}}^2 \mathcal{L}(\bm{x}^\star, \bm{\lambda}) \bm{z} > 0
This condition is sufficient to guarantee that \bm{x}^\star is a strict local minimum.
Summary of Conditions
First-Order Condition: The gradient of the objective function must lie in the span of the gradients of the constraints:
\nabla f(\bm{x}^\star) \in \text{span} \{ \nabla h_1(\bm{x}^\star), \dots, \nabla h_m(\bm{x}^\star) \}
Second-Order Necessary Condition: The Hessian of the Lagrangian must be semi-positive definite on the null space of the constraints:
\bm{z}^T \nabla_{\bm{x}}^2 \mathcal{L}(\bm{x}^\star, \bm{\lambda}) \bm{z} \geq 0
Second-Order Sufficient Condition: To confirm a strict local minimum, the Hessian of the Lagrangian must be positive definite on the null space of the constraints:
\bm{z}^T \nabla_{\bm{x}}^2 \mathcal{L}(\bm{x}^\star, \bm{\lambda}) \bm{z} > 0
These conditions, together, are essential to verify whether a candidate solution to a constrained minimization problem is truly a local minimum.
Example 1 (constrained minimization problem) Let’s consider the problem on where we want to minimize the function f:\Bbb{R}^2\rightarrow R using the constraint h defined as
f(x,y) = \mathrm{e}^{x^2-y^2} \qquad,\qquad h(x,y) = x - y^2
In order to solve this problem we at first need to build the lagrangian (having only one constraint \bm{\lambda} reduces to a scalar) and so
\mathcal{L}(x,y,\lambda) = \mathrm{e}^{x^2-y^2} - \lambda \big(x-y^2\big)
We now need to compute the stationary points of the lagrangian and this means solving the following non linear system of equations:
\begin{cases} \nabla_x \mathcal{L}(x,y,\lambda) = 2 x \mathrm{e}^{x^2-y^2} - \lambda = 0 \\ \nabla_y \mathcal{L}(x,y,\lambda) = -2 y \mathrm{e}^{x^2-y^2} + 2 \lambda y = 0 \\ \nabla_\lambda \mathcal{L}(x,y,\lambda) = -x + y^2 = 0 \end{cases}
\begin{aligned} \implies \quad & \big(x,y,\lambda\big)=\big(0,0,0\big), \\ \implies \quad & \big(x,y,\lambda\big)= \left(\frac{1}{2}, \frac{1}{\sqrt{2}}, \mathrm{e}^{-\frac{1}{4}}\right), \\ \implies \quad & \big(x,y,\lambda\big)= \left(\frac{1}{2}, -\frac{1}{\sqrt{2}}, \mathrm{e}^{-\frac{1}{4}}\right) \end{aligned}
To determine now it this points are local maximum or minimum we have to firstly define the general gradient of the constraint map and then the hessian of the Lagrangian in respect to the variable \bm{x} =(x,y):
\begin{aligned} \nabla h(x,y) & = (1,-2y) \\ \nabla^2_{(x,y)} \mathcal{L} & = \begin{bmatrix} (4x^2+2)\mathrm{e}^{x^2-y^2} & -4xy \mathrm{e}^{x^2-y^2} \\ -4xy \mathrm{e}^{x^2-y^2} & (4y^2-2)\mathrm{e}^{x^2-y^2} + 2\lambda \end{bmatrix} \end{aligned}
Now we have to check each stationary point independently:
when x=y=\lambda = 0 we have that \nabla h = (1,0) while \nabla^2_{(x,y)} \mathcal{L} = \begin{bmatrix} 2 & 0 \\ 0 & -2 \end{bmatrix} By computing the null space of the vector (1,0) we can see that all the vectors in the form (0,\alpha) match the definition; we can now check if the point is of maximum/minimum be determining if the matrix \nabla^2_{(x,y)}\mathcal{L} is positive or negative defined: \big(0 \ \ \alpha\big) \begin{bmatrix} 2 & 0 \\ 0 & -2 \end{bmatrix} \begin{pmatrix} 0 \\ \alpha \end{pmatrix} = -2\alpha^2 \leq 0 for all \alpha\in\Bbb{R}. The hessian matrix is negative defined and so the point (x,y) = (0,0) is a local maximum.
evaluating for the second point x = \frac{1}{2},\quad y = \frac{1}{\sqrt{2}}, \quad\textrm{and}\quad \lambda = \mathrm{e}^{-\frac{1}{4}} we can compute the gradient \nabla h = (1, - \sqrt 2) of the constraint map that determines a null space of the form (\alpha \sqrt{2},\alpha). Given the hessian matrix of the transform we can see that it’s positive defined, in fact \mathrm{e}^{-\frac 1 4} \begin{pmatrix} \alpha \sqrt 2 & \alpha \end{pmatrix} \begin{bmatrix} 3 & -\sqrt 2 \\ -\sqrt 2 & 2 \end{bmatrix} \begin{pmatrix} \alpha \sqrt 2 \\ \alpha \end{pmatrix} = 4 \mathrm{e}^{-\frac 12} \alpha^2 > 0 for all \alpha \in \Bbb{R}. This means that the point is a local minimum.
considering instead the last point x = \frac{1}{2},\quad y = - \frac{1}{\sqrt 2} \quad\textrm{and}\quad \lambda = \mathrm{e}^{-\frac{1}{4}} we have a similar gradient \delta h = (1,\sqrt 2) that determines a kernel in the form (\alpha \sqrt 2,-\alpha). Evaluating the hessian on the null space base we can see that the matrix is positive defined, in fact \mathrm{e}^{-\frac{1}{4}} \begin{pmatrix} \alpha \sqrt 2 & -\alpha \end{pmatrix} \begin{bmatrix} 3 & -\sqrt 2 \\ -\sqrt 2 & 2 \end{bmatrix} \begin{pmatrix} \alpha \sqrt 2 \\ -\alpha \end{pmatrix} = 4 \mathrm{e}^{-\frac 12} \alpha^2 > 0 for all \alpha \in \Bbb{R}
We can see that the both points
\left(\frac 12, \frac 1{\sqrt 2}\right) \quad\textrm{and}\quad \left(\frac 12, -\frac 1{\sqrt 2}\right)
are local minimum and they are both also global minimum because we can see that
f\left(\frac{1}{2}, \frac 1{\sqrt{2}}\right) = f\left(\frac{1}{2}, - \frac 1{\sqrt{2}}\right) = \mathrm{e}^{-\frac{1}{4}}.
Example 2 (determination of the non-linear system) Given the problem
\begin{aligned} \textrm{minimize:} \qquad & f(x,y,z) = x - y + z^2 \\ \textrm{subject to:} \qquad & h_1(x,y,z) = x^2 + y^2 - 2 = 0 \\ & h_2(x,y,z) = x+z-1 = 0 \end{aligned}
the solution can be computed by firstly determining the lagrangian \mathcal{L} = f - \lambda_1 h_1 - \lambda_2 h_2
of the problem that’s
\mathcal{L}(x,y,z, \lambda_1, \lambda_2) = x-y+z^2 - \lambda_1\big(x^2+y^2-2\big) - \lambda_2\big(x+z-1\big)
The first order necessary condition related to the Lagrange multipliers states that a point, to be a minimum, must be a stationary one and so the candidate minimum points for this problem can be computed by solving the following system of non-linear equation:
\begin{cases} 1 - 2\lambda_1 x - \lambda_2 = 0 & : \dfrac{\partial}{\partial x}\mathcal{L} \\[1em] -1 - 2\lambda_1 y = 0 & : \dfrac{\partial}{\partial y} \mathcal{L} \\[1em] 2z - \lambda_2 = 0 & : \dfrac{\partial}{\partial z} \mathcal{L} \\[1em] x^2 + y^2 = 2 & : -\dfrac{\partial}{\partial \lambda_1} \mathcal{L} =h_1 \\[1em] x + z = 1 & : -\dfrac{\partial}{\partial \lambda_2}\mathcal{L} = h_2 \end{cases}
Sylvester theorem
The tedious and error prone operation of finding the minimum with the lagrangian multiplier is the one that’s performed to determine if the matrix is (or is not) semi positive defined in the kernel
\ker\{\nabla \bm{h}(\bm{x}^\star)\}
of the gradient of the constraints map. In fact for every stationary point \bm{x}^\star of the lagrangian we have to check that
\bm{z}^t \, \nabla_{\bm{x}}^2 \mathcal{L}(\bm{x}^\star,\bm{\lambda}^\star) \, \bm{z} \geq 0 \qquad \forall \bm{z}\in \ker\{\nabla \bm{h}(\bm{x}^\star)\}
For sake of simplicity from now we will denote the matrix \nabla_{\bm{x}}^2 \mathcal{L}(\bm{x}^\star,\bm{\lambda}^\star) as \bm{A}. We can note that the vector \bm{z} \in \ker\{\nabla \bm{h}(\bm{x}^\star)\} (and from now on we refer to the matrix \nabla \bm{h} as \bm{B}) can be expressed as a linear combination of the vectors \bm{k}_i (that are representing the base of \bm{B}) in the way
\bm{z} = \bm{k}_1 \alpha_1 + \bm{k}_2 \alpha_2 + \dots + \bm{k}_p \alpha_p = \bm{K} \bm{\alpha} \qquad \bm{\alpha} \in \Bbb{R}^p
We can see that this expression can be reduced to a multiplication of a matrix K \in \Bbb{R}^{n\times p} (whose columns are the vector \bm{k}_i of the kernel base) and a p-dimensional vector \bm{\alpha} (where p is the number of constraints in the map \bm{h}(\bm{x}) ).
We can now rewrite the second order necessary condition considering that
\bm{z}^t \bm{A} \bm{z} = \bm{\alpha}^t \bm{K}^t \bm{A} \bm{K} \bm{\alpha} = \bm{\alpha}^t \bm{C} \bm{\alpha}, \quad \bm{C} := \bm{K}^t\bm{A}\bm{K} \in \Bbb{R}^{p\times p}
Example 3 Let’s consider the numerical problem when the values of the matrix
\bm{A} = \nabla_{\bm{x}}\mathcal{L} \quad\textrm{and}\quad \bm{B}=\nabla \bm{h}
are given with values
\bm{A} = \begin{bmatrix} 1 & 2 & 3 \\ 2 & 0 & 1 \\ 3 & 1 & 0 \end{bmatrix} \qquad \bm{B} = \begin{bmatrix}1&0&0\end{bmatrix}
The first thing now is to compute the manually compute the kernel of the matrix \bm{B} in \Bbb{R}^3 solving the linear system
\begin{bmatrix} 1 &0 &0 \end{bmatrix} \begin{bmatrix} z_1 \\ z_2 \\ z_3 \end{bmatrix}= \bm{0} \quad \implies \quad \begin{cases} z_1 = 0 \\ z_2 = \alpha \\ z_3 = \beta \end{cases} \quad \alpha,\beta\in\Bbb{R}
At this point we can rewrite the kernel of B using the linear combination of the vector composing the base:
\ker\{\bm{B}\} = \begin{bmatrix}0 \\ \alpha \\ \beta\end{bmatrix} = \begin{bmatrix}0 \\ 1 \\ 0\end{bmatrix} \alpha + \begin{bmatrix} 0 \\ 0 \\ 1 \end{bmatrix}\beta = \underbrace{ \begin{bmatrix} 0 & 0 \\ 1 & 0 \\ 0 & 1\end{bmatrix}}_{=\bm{K}} \begin{bmatrix} \alpha \\ \beta \end{bmatrix}
The last thing is now to compute the matrix C that should be analyzed to know if it’s (semi) positive defined or not:
\begin{aligned} \bm{K}^t\bm{A}\bm{K} & = \begin{bmatrix}0 & 1 & 0 \\ 0 & 0 & 1\end{bmatrix} \begin{bmatrix} 1 & 2 & 3 \\ 2 & 0 & 1 \\ 3 & 1 & 0 \end{bmatrix} \begin{bmatrix} 0 & 0 \\ 1 & 0 \\ 0 & 1 \end{bmatrix} \\ \bm{C} & = \begin{bmatrix} 0&1\\1&0 \end{bmatrix} \end{aligned}
This definition reduces the complexity of the problem at analyzing the matrix C (that’s smaller than the original matrix A) and determining if that particular matrix is (semi) positive defined using two methods:
considering the fact that \bm{C} is a symmetric matrix we know for sure that it can be diagonalised with an expression \bm{T}^t \bm{\Lambda}\bm{T} (where \bm{\Lambda} is a diagonal matrix containing all the eigenvalues of \bm{C}); considering the expression \bm{\alpha}^t \bm{T}^t \bm{\Lambda} \bm{T} \bm{\alpha} in order to have a semi positive defined matrix all the eigenvalue \lambda_i must be positive or at least equals to zero. If it happens that \lambda_i > 0 \ \forall i then the matrix is positive defined and the point is a local minimum;
one other solution is to use the Sylvester theorem that states:
a symmetric matrix \bm{A}\in \Bbb{R}^{n\times n} is positive defined if and only if all the principal minors of \bm{A} are strictly positive
(note that if one minor is equal to zero, no information can be retrieved with this method).
Example 4 (application of the Sylvester theorem) Let’s consider now the matrix
\bm{A} = \begin{bmatrix} 1 & 2 \\ 2 & 5 \end{bmatrix}
In order to determine if \bm{A} is positive defined we have to compute all the principle minors starting from the first \bm{A}_1 that’s represented only by the element a_{11} of \bm{A} and so
\bm{A}_1 = \det[1] = 1 > 0
The second (and last) principle minor of \bm{A} is the determinant of the whole matrix and it happens that
\bm{A}_2 = \det \begin{bmatrix} 1 & 2 \\ 2 & 5 \end{bmatrix} = 1 > 0
Having all minors greater then zero, this means that the matrix \bm{A} is positive defined.
The same result can be achieved by determining all the eigenvalues of the matrix and showing they are all positive; we can in fact see that the characteristic polynomial of \bm{A} is equal to
\begin{aligned} p(\lambda) & = \det \big[\bm{A}-\lambda \bm{I}\big] \\ & = \det\begin{bmatrix} 1-\lambda & 2 \\ 2 & 5-\lambda \end{bmatrix} \\ & = \lambda^2 - 6\lambda + 1 \end{aligned}
and thus \lambda_{1,2} = \frac{6\pm \sqrt{36-4}}{2} = 3 \pm \frac{\sqrt{32}}{2}
nothing that
\frac{\sqrt{32}}{2} < \frac{\sqrt{36}}{2} = 3
then all the eigenvalues \lambda_1,\lambda_2 are strictly positive and so \bm{A} is positive defined.
Inequality constraints
Let’s introduce now the problem of minimizing a function f(\bm{x}):\Bbb{R}^n\rightarrow R considering a set of p inequalities constrains in the form g_k(\bm{x}) \geq 0.
\begin{aligned} \textrm{minimize:} \qquad & f(\bm{x}) \\ \textrm{subject to:}\qquad & g_k(\bm{x}) \geq 0 \qquad k = 1,\dots,p \end{aligned}
The first approach to this problem is by trying to transform the inequality constraints g_k into equality one (so having h_k(\bm{x}) = 0). This can be done considering that each constraint can be expressed as
g_k(\bm{x}) = \varepsilon_k^2 \geq 0 \quad \implies \quad h_k(\bm{x},\varepsilon_k) = g_k(\bm{x}) - \varepsilon_k^2 = 0
Doing this process for each constraint we can determine a minimization problem that depends both on the function variable \bm{x} but also on the so called slack variables \bm{\varepsilon}:
\begin{aligned} \textrm{minimize:} \qquad & f(\bm{x}) \\ \textrm{subject to:}\qquad & h_k(\bm{x}, \bm{\varepsilon}) = 0 \qquad k = 1,\dots,p \end{aligned}
This kind of problem increases the computational costs because it increases the variable to minimize from n to n+p.
We used the expression \varepsilon_k^2 and not \varepsilon_k because with this expression we don’t need to specify one more inequality \varepsilon_k \geq 0.
Example 5 Let’s consider the following problem:
\begin{aligned} \textrm{minimize:} \qquad & f(x,y) = x^2 \\ \textrm{subject to:}\qquad & x^2+y^2\leq 1 \\ & x+y \geq 0 \end{aligned}
In order to solve this problem we have to reduce the inequality constraints to equality ones; considering the first constraint, that has to be expressed in the form g_1 \geq 0 and so
\begin{aligned} g_1(x,y) & = 1 -x^2-y^2 \geq 0 \\ & \Downarrow \\ h_1(x,y) & = 1-x^2-y^2-\varepsilon_1^2 \end{aligned}
At the same way it’s possible to express the second inequality as the a constraint
h_2(x,y,\varepsilon_2) = x+y-\varepsilon_2^2
With this expression being set the problem reduces to the form
\begin{aligned} \textrm{minimize:} \qquad & f(x,y,\epsilon_1,\epsilon_2) = x^2 \\ \textrm{subject to:}\qquad & h_1(x,y,\epsilon_1,\epsilon_2) = 1- x^2 - y^2 - \epsilon_1^2 = 0 \\ & h_2(x,y,\epsilon_1,\epsilon_2) = x+y - \varepsilon_2^2 = 0 \end{aligned}
With the problem this stated we can build the lagrangian that depend’s on both the original variables x,y but also the slack variables \epsilon_1,\epsilon_2 and so
\begin{aligned} \mathcal{L}(\underbrace{x,y,\epsilon_1,\epsilon_2}_{\bm{x}},\underbrace{\lambda_1,\lambda_2}_{\bm{\lambda}}) &= f(\bm{x}) - \lambda_1 h_1(\bm{x}) - \lambda_2 h_2(\bm{x}) \\ & = x^2 - \lambda_1 \big( 1- x^2-y^2-\epsilon_1^2 \big) - \lambda_2 \big( x+y-\epsilon_2^2 \big) \end{aligned}
To find the local minimum point we can start using the first necessary condition of the lagrangian multiplier, so by determining all the stationary point such that \nabla \mathcal{L} = \bm{0}; in practise this means solving the following non linear system of 6 equations in 6 variables:
\left\{ \begin{aligned} \partial/\partial_x \mathcal{L} =&& 2x -2\lambda_1x -\lambda_2 &= 0 \\ \partial/\partial_y \mathcal{L} =&& 2\lambda_1 y - \lambda_2 &= 0\\ \partial/\partial_{\epsilon_1}\mathcal{L} {\epsilon_1} =&& 2\lambda_1\epsilon_1 &= 0 \\ \partial/\partial_{\epsilon_2}\mathcal{L} {\epsilon_2} =&& 2\lambda_2 \epsilon_2 &= 0 \\ \partial/\partial_{\lambda_1} \mathcal{L} {\lambda_1} =&& x^2+y^2 + \epsilon_1^2-1 &= 0 \\ \partial/\partial_{\lambda_2}\mathcal{L} {\lambda_2} =&& x + y - \epsilon_2^2 &= 0 \end{aligned} \right.
Solving minimization problems with inequality constraints can be tricky using the lagrangian multiplier method and so other algorithm (like the KKT that’s going to be explained) are usually preferred. However this kind of system can be solved by cases; considering in fact the second and third equations we can discriminate the solution considering the various combination when \epsilon_1, \epsilon_2 are equal (or not) to zero.
Considering for example the case \epsilon_1 = \epsilon_2 = 0 the non linear system reduces to the form
\begin{cases} 2x - 2\lambda_1 x - \lambda_2 = 0 \\ 2\lambda_1 y - \lambda_2 = 0 \\ x^2+y^2-1 = 0 \\ x + y = 0 \end{cases}
Considering that the first two expressions are linear in respect to \lambda_1,\lambda_2 it’s possible to determine their value directly in terms of x,y:
\lambda_1 = \frac{x}{y-x} \qquad \lambda_2 = 2 \frac{xy}{y-x}
Considering instead the last 2 equations we have a non linear system that can be also computed by cases; from the last equation we have in fact that y = -x and so
1-x^2-x^2 = 0 \qquad \implies \quad x = \pm \frac{1}{\sqrt{2}}
At this end of this process we defined two stationary points for the lagrangian:
\begin{aligned} \big(x,y,\epsilon_1,\epsilon_2,\lambda_1,\lambda_2\big) & = \left( \frac 1 {\sqrt 2} , - \frac{1}{\sqrt 2},0,0,-\frac 1 2, \frac{\sqrt 2}{2} \right), \\ \big(x,y,\epsilon_1,\epsilon_2,\lambda_1,\lambda_2\big) & = \left( - \frac 1 {\sqrt 2} , \frac{1}{\sqrt 2},0,0,\frac 1 2, -\frac{\sqrt 2}{2} \right). \end{aligned}
Solving symbolically this system of equation with a computer the lonely real solution of the problem (that determines a local minimum) is expressed in the form
\big(x,y,\epsilon_1,\epsilon_2,\lambda_1,\lambda_2\big)=\left( 0, \sqrt{-\epsilon_1^2 + 1 }, \epsilon_1 ,\sqrt[4]{1 - \epsilon_1^2}, 1, 1 \right)
In order for the square roots to exists the value \epsilon_1^2 should fit inside the range [0,1] and so the local minimum of the problem can be found on the vertical line
(x,y) = (0,k) \qquad k \in [0,1]
Karush-Kuhn-Tucker conditions
The Karush-Kuhn-Tucker (KKT) conditions are a set of necessary and sufficient conditions (first and second order) that allows to describe if a point \bm{x}^\star is a minimum point of a minimization problem subjected to equality and inequality constraints, so for the problem
\begin{aligned} \textrm{minimize:} \qquad & f(\bm{x}) \\ \textrm{subject to:}\qquad \ & h_k(\bm{x}) = 0 \qquad && k=1,\dots, m\\ & g_k(\bm{x}) \geq 0 &&k = 1,\dots,p \end{aligned}
where f,h_k,g_k: \Bbb{R}^n\rightarrow \Bbb{R} are real evaluated function.
In order to describe this conditions it’s important to define *set of the active constraints** \mathcal{A} as
\mathcal{A}(\bm{x}^\star) = \big\{ k \ | \ g_k(\bm{x}^\star) = 0 \big\}
This means that given the constraints g\in C(\Bbb{R}^n) it’s defined as if it happens that the point \bm{x}^\star is such that g(\bm{x}^\star) = 0, so in practical way it’s in the between activating and not activating the constraints.
Qualified constraints
Given \bm{g}\in C^1(\Bbb{R}^p,\Bbb{R}^n) and \bm{h} \in C^1(\Bbb{R}^m,\Bbb{R}^n) respectively the inequality and equality constraints map, then a point \bm{x}^\star is defined qualified if the gradients of the active constraints (and the equality ones) are linearly independent, and so
\big\{ \nabla g_k(\bm{x}^\star) \ : \ k \in \mathcal A(\bm{x}^\star) \big\} \bigcup \big\{ \nabla h_1(\bm{x}^\star),\dots, \nabla h_m(\bm{x}^\star) \big\}
are linearly independent
Example 6 (qualification of a point) Considering the minimization problem started in example Example 2 stated as
\begin{aligned} \textrm{minimize:} \qquad & f(x,y,z) = x - y + z^2 \\ \textrm{subject to:} \qquad & h_1(x,y,z) = x^2 + y^2 - 2 = 0 \\ & h_2(x,y,z) = x+z-1 = 0 \end{aligned}
and considering the candidate point of minimum (resulting from the solution of the non-linear system)
\begin{aligned} x^\star &= \frac{\sqrt 3}{2} - \frac{1}{2}\\ y^\star &= \frac 1 2 + \frac{\sqrt 3}{2} \\ z^\star &= \frac 3 2 - \frac{\sqrt 3}{2} \\ \lambda_1^\star & = \frac{1}{2} - \frac{\sqrt3}{2} \\ \lambda_2^\star &= 3-\sqrt{3} \end{aligned}
In order to check if the chosen point is qualified it’s necessary to check if the gradients of the equality constraint and the active ones (that in this case are not present because there aren’t inequality constraints) are linearly independent. In order to solve this problem firstly we need to build the matrix H represent the gradients of the constraints:
H(x,y,z) = \begin{bmatrix} \nabla h_1(x,y,z) \\ \nabla h_2(x,y,z) \end{bmatrix} = \begin{bmatrix} 2x & 2y & 0 \\ 1 & 0 & 1 \end{bmatrix}
\implies \qquad H^\star= H(x^\star,y^\star,z^\star) = \begin{bmatrix} \sqrt 3 - 1 & 1 + \sqrt 3 & 0 \\ 1 & 0 & 1 \end{bmatrix}
It’s straightforward to see that the lines of the matrix H^\star are linearly independent, considering in fact the rank of H^\star by analysing the columns we see that’s equal to 2, so equalizing the number of the constraints.
First order necessary condition
Let f\in C^1(\Bbb{R}^n) a function to minimize subjected to the inequality \bm{g}\in C^1(\Bbb{R}^n,\Bbb{R}^p) and equality \bm{h} \in C^1(\Bbb{R}^n,\Bbb{R}^m) constraint maps. If \bm{x}^\star satisfy constraint qualification then necessary condition for \bm{x}^\star to be a local minima is that that exists m+p scalars \lambda_i,\mu_j such that all the following conditions are satisfied:
\begin{aligned} \nabla_{\bm{x}} \mathcal{L} \big(\bm{x}^\star,\bm{\lambda}^\star,\bm{\mu}^\star\big) & = \bm{0} \\ h_k\big(\bm{x}^\star\big) & = 0 \qquad && k = 0,\dots, m\\ g_k\big(\bm{x}^\star\big) & \geq 0 && k = 0,\dots, p \\ \mu_k^\star g_k\big(\bm{x}^\star\big) & = 0 && k = 0,\dots, p \\ \mu_k^\star & \geq 0 && k = 0,\dots,p \end{aligned} \tag{1}
where the lagrangian is in this case defined as
\mathcal{L}(\bm{x},\bm{\lambda},\bm{\mu}) = f(\bm{x}) - \sum_{k=1}^p \mu_k g_k(\bm{x}) - \sum_{k=1}^m \lambda_k h_k(\bm{x})
Example 7 Solving minimization problem with the KKT conditions is generally more easier; considering the case of the minimization problem in Example 5 the lagrangian will depend only on 4 parameter (instead of 6 as in that example):
\mathcal{L}(x,y,\mu_1,\mu_2) = x^2 - \mu_1\big(1-x^2-y^2\big) - \mu_2\big(x+y\big)
Using the first order KKT condition the main non linear system to solve is in the form
\left\{\begin{aligned} \left.\begin{aligned} 2x +2\mu_1x - \mu_2 &= 0 \\ 2 \mu_1 y - \mu_2 &= 0 \end{aligned} \right\} & \quad \nabla_{\bm{x}} \mathcal{L}\big(\bm{x}, \bm{\lambda}\big) \\ \left. \begin{aligned} \mu_1\big(1-x^2-y^2\big) & = 0 \\ \mu_2 \big(x+y\big) & = 0 \end{aligned} \right\} & \quad \mu_k g_k\big(\bm{x} \big) = 0\\ \mu_1,\mu_2 \geq 0 \quad & \end{aligned} \right.
This system of four non linear equations (the conditions \mu_1,\mu_2\geq0 are easier to consider) can be solved by cases:
if \mu_1 = \mu_2 = 0 then the last two equalities are always verified, while the first two becomes
\begin{cases} 2x = 0 \\ 0 = 0 \end{cases} this means that x = 0, while y is a free parameter that must satisfy both the constraints x+y\geq 0 (so that y \geq 0) and 1-x^2-y^2 \geq 0 (and so 1-y^2 \geq 0 that determines -1 \leq y \leq 1): considering both condition at the same time we have that \begin{aligned} & \begin{cases} y \geq 0 \\ -1 \leq y \leq 1 \end{cases} \quad \implies \quad \\[1em] & \textrm{solution: } x = \mu_1 = \mu_2 = 0 \quad y \in [0,1] \end{aligned}
when considering \mu_1 \neq 0 and \mu_2 = 0 the system of equations become \begin{cases} 2x + 2 \mu_1x = 0 \\ 2\mu_1 y = 0 \\ \mu_1\big(1-x^2-y^2\big) = 0 \end{cases} Based on the assumption of the value \mu_1, from the second equation we determine that y must be equal to 0 and so the third equation becomes 1-x^2 = 0 whose solutions are \pm 1. Considering that the first equation can be factorised as 2x(1+\mu_1) = 0 (and x\neq 0), then in order to satisfy the relation \mu_1 must be -1, but this imply that \mu_1 is negative and it’s an unacceptable solution, so for \mu_1 \neq 0,\mu_2=0 no solutions of the system can be found;
considering instead \mu_1 = 0 and \mu_2 \neq 0 the system becomes \begin{cases} 2x - \mu_2 = 0 \\ -\mu_2 = 0 \\ \mu_2(x+y) = 0 \end{cases} The second equation states that \mu_2 must be equal to 0 and so the solution of this problem is the same as the first case considered;
considering both variables \mu_1,\mu_2 \neq 0 we have to solve the full systems; considering the last two equations we have that \begin{aligned} & \begin{cases} x+y = 0 \\ 1-x^2-y^2 = 0 \end{cases} \\ & \implies\quad x = -y \\ & \implies\quad 1-2x^2 = 0 \end{aligned} and so it means that the two possible solution for the systems are (x,y) = \left( \pm \frac{1}{\sqrt{2}}, \mp \frac{1}{\sqrt{2}} \right). Considering the first case where x = 1/\sqrt 2 the first two equations become \begin{aligned} & \begin{cases} \frac{2}{\sqrt 2} (1+\mu_1) - \mu_2 = 0 \\ -\frac 2 {\sqrt 2} \mu_1 - \mu_2 = 0 \end{cases} \\ & \implies \quad \mu_2 = - \frac 2 {\sqrt 2} \mu_1 \\ & \implies \quad \frac{4}{\sqrt 2} \mu_1 + \frac{2}{\sqrt 2} = 0 \\ & \implies \quad \mu_1 = - \frac{1}{2} \qquad \mu_2 = \frac{1}{\sqrt{2}} \end{aligned} In this case \mu_1 < 0 is an unacceptable solution of the problem, and also considering the point x = -1/\sqrt{2} the system becomes with no considerable solutions \begin{aligned} & \begin{cases} -\dfrac{2}{\sqrt 2} (1+\mu_1) - \mu_2 = 0 \\ \dfrac{2}{\sqrt 2} \mu_1 - \mu_2 = 0 \end{cases} \\ & \implies \quad \mu_1 = -\frac{1}{2}\quad \mu_2 = - \frac{1}{\sqrt{2}} \end{aligned}
With all this considerations done the point of local minimum relies on the points
(x,y) = (0,k) \qquad k\in[0,1]
Second order necessary conditions
Given a function f \in C^1(\Bbb{R}^n) to minimize subject to the equality \bm{h}\in C^1(\Bbb{R}^n,\Bbb{R}^m) and inequality \bm{g}\in C^1(\Bbb{R}^n,\Bbb{R}^p) constraint maps, then necessary conditions for a point \bm{x}^\star (that satisfies the constraints) to be a local minima is that exists m+p scalars \lambda_i,\mu_i such that satisfy the first order conditions (Equation 1) and
\bm{z}^t \, \nabla^2_{\bm{x}} \mathcal{L}\big(\bm{x}^\star,\bm{\lambda}^\star,\bm{\mu}^\star\big) \, \bm{z} \geq 0 \tag{2}
for all vector \bm{z} such that
\begin{aligned} (i) \qquad && \nabla h_k(\bm{x}^\star) \bm{z} & = 0 && k = 1,\dots,m \\ (ii) \qquad &&\nabla g_k(\bm{x}^\star) \bm{z} & = 0 && \textrm{for all } k\in \mathcal{A}(\bm{x}^\star) \textrm{ and } \mu_k > 0 \\ (iii) \qquad &&\nabla g_k(\bm{x}^\star) \bm{z} & \geq 0 && \textrm{for all } k\in \mathcal{A}(\bm{x}^\star) \textrm{ and } \mu_k = 0 \end{aligned}
In general the conditions (ii) and (iii) are hard to verify and so to have a less accurate necessary condition (by having a smaller set of possible vector \bm{z}), and so improving the chance of considering point that don’t belong to the original domain, this two conditions can be substitute with the expression
\nabla g_k(\bm{x}^\star) \bm{z} = 0 \qquad \textrm{for all } k \in \mathcal{A}(\bm{x}^\star)
In other words the second order necessary condition (and similarly the sufficient one) states (from Equation 2) that the the matrix \nabla^2_{\bm{x}} \mathcal{L} must be semi-positive defined in the kernel of the active and equality constraints.
Example 8 (kernel of the active constraints}) Continuing examples Example 2 and Example 6, on the minimum candidate point (x^\star,y^\star,z^\star) we have the following matrix for the active constraints:
H^\star= \begin{bmatrix} \sqrt{3}- 1 & 1 + \sqrt{3} & 0 \\ 1 & 0 & 1 \end{bmatrix}
Computing the kernel of the active constraints means so solving the following system of linear equation determined by H^\star \bm{x} = \bm{0} and so
\begin{bmatrix} \sqrt 3 - 1 & 1 + \sqrt 3 & 0 \\ 1 & 0 & 1 \end{bmatrix} \begin{pmatrix} x \\ y \\ z \end{pmatrix} = \begin{pmatrix}0 \\ 0\end{pmatrix}
resulting in
\begin{cases} (\sqrt 3 - 1) x + (1 + \sqrt 3) y = 0 \\ x + z = 0 \end{cases}
Having 3 unknowns but only 2 equation means that we have to solve the problem in a parametric form; chosen z = t we have x = -t and so y = t \frac{\sqrt 3 - 1}{\sqrt 3 + 1}; choosing t = \sqrt 3 + 1 (to simplify the results) we obtain the generator K of the kernel as
\bm{K} = \begin{pmatrix} -\sqrt 3 - 1 \\ \sqrt 3 - 1 \\ \sqrt 3 +1 \end{pmatrix}
Second order sufficient conditions
Given a function f \in C^1(\Bbb{R}^n) to minimize subject to the equality \bm{h}\in C^1(\Bbb{R}^n,\Bbb{R}^m) and inequality \bm{g} \in C^1(\Bbb{R}^n,\Bbb{R}^p) constraint maps, then sufficient condition for a point \bm{x}^\star (that satisfies the constraints) to be a local minima is that exists m+p scalars \lambda_i,\mu_i such that satisfy the first order conditions (Equation 1) and
\bm{z}^t \, \nabla^2_{\bm{x}} \mathcal{L}\big(\bm{x}^\star,\bm{\lambda}^\star,\bm{\mu}^\star\big) \, \bm{z} > 0
for all vector \bm{z} such that
\begin{aligned} (i) \qquad && \nabla h_k(\bm{x}^\star) \bm{z} & = 0 && k = 1,\dots,m \\ (ii) \qquad &&\nabla g_k(\bm{x}^\star) \bm{z} & = 0 && \textrm{for all } k\in \mathcal{A}(\bm{x}^\star) \textrm{ and } \mu_k > 0 \\ (iii) \qquad &&\nabla g_k(\bm{x}^\star) \bm{z} & \geq 0 && \textrm{for all } k\in \mathcal{A}(\bm{x}^\star) \textrm{ and } \mu_k = 0 \end{aligned}
As in the previous case in order to have simpler calculation the third equation (iii) can be dropped (and so we obtain less accurate solution of the local minima).
Example 9 (second order check) Given the problem started in example Example 2 and carried on up to Example 8 states as
\begin{aligned} \textrm{minimize:} \qquad & f(x,y,z) = x - y + z^2 \\ \textrm{subject to:} \qquad & h_1(x,y,z) = x^2 + y^2 - 2 = 0 \\ & h_2(x,y,z) = x+z-1 = 0 \end{aligned}
we found a candidate for minimum point in
\begin{aligned} x^\star &= \frac{\sqrt 3}{2} - \frac 1 2 \\ y^\star &= \frac 1 2 + \frac{\sqrt 3} 2 \\ z^\star & = \frac 3 2 - \frac{\sqrt 3}{2} \\ \lambda_1^\star &= \frac 1 2 - \frac{\sqrt3}{2} \\ \lambda_2^\star &= 3-\sqrt 3 \end{aligned}
that present the matrix \bm{H}^\star for the active constraints with related kernel \bm{K}:
\bm{H}^\star= \begin{bmatrix} \sqrt 3 - 1 & 1 + \sqrt 3 & 0 \\ 1 & 0 & 1 \end{bmatrix} \qquad \bm{K} = \begin{bmatrix} -\sqrt 3 - 1 \\ \sqrt 3 - 1 \\ \sqrt 3 + 1 \end{bmatrix}
In order to check the second order necessary/sufficient condition we have to compute the hessian matrix \nabla^2_{\bm{x}} \mathcal{L} of the lagrangian that’s
\begin{aligned} \nabla^2_{\bm{x}} \mathcal{L} = \bm{L} & = \begin{bmatrix} -2 \lambda_1 & 0 & 0 \\ 0 & -2 \lambda_1 & 0 \\ 0 & 0 & 2 \end{bmatrix} \qquad \implies\\ \bm{L}^\star &= \begin{bmatrix} \sqrt 3 - 1 & 0 & 0 \\ 0 & \sqrt 3 - 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} \end{aligned}
We have now to determine if this resulting matrix is (semi-)positive defined or not in the kernel of active constraints in order to determine if the point is a minimum or not; this mean performing the following matrix multiplication:
\begin{aligned} \alpha \bm{K}^t \bm{L}^\star \bm{K} \alpha & = \alpha \begin{bmatrix} -\sqrt 3 - 1 & \sqrt{3} - 1 & \sqrt{3} + 1 \end{bmatrix} \\ &\qquad \begin{bmatrix} \sqrt 3 - 1 & 0 & 0 \\ 0 & \sqrt 3 - 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} -\sqrt 3 - 1 \\ \sqrt 3 - 1 \\ \sqrt 3 + 1 \end{bmatrix}\alpha \\ & = \alpha \begin{bmatrix} -\sqrt 3 - 1 & \sqrt 3 - 1 & \sqrt 3 + 1 \end{bmatrix} \begin{bmatrix} -2 \\ 4 -2\sqrt 3 \\ \sqrt 3 + 1 \end{bmatrix}\alpha \\ & = \alpha \big( 2\sqrt{3} +2 + 4\sqrt{3} - 4 -6 + 2\sqrt{3} + 3 + 2\sqrt{3} +1 \big) \alpha \\ & = \big(10 \sqrt{3} - 4\big)\alpha^2 \end{aligned}
Observing that for \alpha \neq 0 the product is always greater then zero we have that the hessian L^\star is positive defined in the kernel of the active constraints, satisfying the second order sufficient condition that allow so to state that
\bm{x}^\star = \left( \frac{\sqrt{3}}{2} - \frac{1}{2}, \frac{1}{2}+ \frac{\sqrt{3}}{2}, \frac{3}{2}- \frac{\sqrt{3}}{2} \right) is a minimum.
Example 10 Given the problem
\begin{aligned} \textrm{minimize:}&& f(x,y) &= x^2 - xy \\ \textrm{subject to:}&& g_1(x,y) &= 1-x^2-y^2 \geq 0 \\ && g_2(x,y) &= 1- (x-1)^2-y^2 \geq 0 \end{aligned}
To solve this kind of problem using the KKT conditions we firstly need to build the lagrangian of the problem that’s equal to
\begin{aligned} \mathcal{L}(x,y,\mu_1,\mu_2) & = x^2-xy - \mu_1 \big(1-x^2-y^2\big) \\ & - \mu_2 \big[ 1- (x-1)^2 - y^2 \big] \end{aligned}
At this point it’s possible to construct the non linear system of equation whose solution are the stationary points candidate to be local minimum:
\left\{\begin{aligned} \left. \begin{aligned} 2x - y +2\mu_1 x + 2\mu_2(x-1) &= 0 \\ -x + 2 \mu_1 y +2 \mu_2 y &= 0 \end{aligned} \right\} & \quad \nabla_{\bm{x}} \mathcal{L}\big(\bm{x}, \bm{\lambda}\big) \\ \left. \begin{aligned} \mu_1 \big(1-x^2-y^2\big) & = 0 \\ \mu_2 \big[ 1- (x-1)^2 - y^2 \big] & = 0 \end{aligned} \right\} & \quad \mu_k g_k\big (\bm{x} \big) = 0\\ \mu_1,\mu_2 \geq 0 \quad & \end{aligned} \right.
This system can be solved by cases of the value of \mu_i and in particular
when \mu_1 = \mu_2 = 0 the system reduces to the form \begin{cases} 2x - y = 0 \\ - x = 0 \end{cases} \quad \implies \quad x = y = 0 This solution satisfy the constraints, in fact g_1(0,0) = 1 \geq 0 and g_2(0,0) = 0 \geq 0, so this point can be used to compute the second order conditions;
considering all the other cases of \mu_j the only other analytical solution to the system, found with
Mathematica
, determines the point (x,y,\mu_1,\mu_2) = \left( \frac 1 2, \frac{\sqrt 3}{2}, \frac 1{\sqrt 3} - \frac 12 , \frac 1 2 - \frac{\sqrt{3}}{6} \right)
In order to consider now the second order necessary/sufficient conditions we have firstly to evaluate the gradient \nabla \bm{g} of the inequality constraint map (in respect to the variables x,y) resulting in the matrix
\nabla \bm{g} = \begin{bmatrix} -2x & -2y \\ - 2(x-1) & -2y^2 \end{bmatrix}
For the second order conditions is also important to define the hessian matrix of the lagrangian respect to the variables x,y:
\nabla_{\bm{x}}^2 \mathcal{L} = \begin{bmatrix} 2 + 2\mu1 + 2 \mu2 & -1 \\ -1 & 2\mu_1 + 2\mu_2 \end{bmatrix}
We have now to check the conditions for the two local minima candidates:
- in the first case when x=y = \mu_1 = \mu_2 = 0 the set of the active constraints is just the second inequality g_2: in fact we have that g_1(0,0) = 1 \neq 0 while g_2(0,0) = 0. In respect to this vector we have to compute it’s kernel (to determine the direction z on which verify the KKT condition) and in particular \nabla g_2(0,0) = \begin{bmatrix} 0\\ 2 \end{bmatrix} \quad \implies \quad \ker\big\{ \nabla g_2(0,0) \} = \alpha\begin{bmatrix} 1 \\ 0\end{bmatrix} Evaluating the expression \bm{z}^t \, \nabla_{\bm{x}}^2 \mathcal{L} \, \bm{z} for \bm{z}\in \ker\{\nabla g_2\} we obtain that \alpha \begin{bmatrix} 1 & 0 \end{bmatrix}\begin{bmatrix} 2 & - 1 \\ - 1 & 2\end{bmatrix}\begin{bmatrix}1 \\ 0\end{bmatrix} \alpha = 2\alpha^2
Example 11 (problem from an exam) Given the problem
\begin{aligned} \textrm{minimize:} \qquad & f(x,y,z) = x - y + z^2 \\ \textrm{subject to:} \qquad & h(x,y,z) = x^2 + y^2 - 2 = 0 \\ & g(x,y,z) = x^2-1 \geq 0 \end{aligned}
the solution can be obtained by using the KKT condition; in order to use them we firstly need to define the lagrangian
\mathcal{L} = f - \lambda h - \mu g
of the problem that in this is of the form
\mathcal{L}(x,y,z,\lambda,\mu) = x - y + z^2 - \lambda\big(x^2+y^2-2\big) - \mu \big(x^2 - 1\big)
The first order necessary condition allows to build the system of non-linear equation whose solution are candidates to be minimum point of f (respect to the constrains set), and so we have
\begin{cases} 1 -2 \lambda x - 2\mu x = 0 & : \partial/\partial_x \mathcal{L} \\ -1 - 2\lambda y = 0 & : \partial/\partial_y \mathcal{L} \\ -2z = 0 & : \partial/\partial_z \mathcal{L} \\ x^2 + y^2 - 2 = 0 & : h \\ x^2-1 \geq 0 & : g\\ \mu(x^2-1) = 0 &: \mu g\\ \mu \geq 0 & : \mu \end{cases}
Verifying that the point \bm{x}^\star = (1,1,0) is a candidate minimum with \lambda^\star = -\frac1 2 and \mu^\star = 1 (substituting the values in the system, all the relation are satisfied), the associated set of active constraints is represented by both h (in fact 1+1-2=0) and g (indeed 1-1=0) whose gradient can be regarded as
\begin{aligned} \nabla \mathcal A & = \begin{bmatrix} \nabla h \\ \nabla g \end{bmatrix} = \begin{bmatrix} 2x & 2y & 0 \\ 2x & 0 & 0 \end{bmatrix} \quad \implies \quad \\ \bm{H} & = \nabla \mathcal A(\bm{x}^\star) = \begin{bmatrix} 2 & 2 & 0 \\ 2 & 0 & 0 \end{bmatrix} \end{aligned}
The constraints are so qualified because the gradients are linearly independent and related kernel, obtained by solving the equation H\bm{x} = \bm{0}, gives a generator of the space in the form of \bm{K} = (0,0,1).
In order to determine if \bm{x}^\star is a minimum point using the second order necessary/sufficient KKT conditions it’s mandatory to compute the hessian \nabla^2_{\bm{x}} \mathcal{L} of the lagrangian:
\begin{aligned} \nabla^2_{\bm{x}}\mathcal{L} & = \begin{bmatrix} -2(\lambda+\mu) & 0 & 0 \\ 0 & -2\lambda & 0 \\ 0 & 0 & - 2 \end{bmatrix} \quad \implies \quad \\ \bm{L} & = \nabla^2_{\bm{x}} \mathcal{L}(\bm{x}^\star) = \begin{bmatrix} -1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 2 \end{bmatrix} \end{aligned}
We now need to check if the matrix \bm{L} is (semi-)positive defined in the kernel of the active constraints, and so we have to compute
\begin{aligned} \alpha\bm{K}^t\bm{L}\bm{K} \alpha & = \alpha \begin{bmatrix} 0 & 0 & 1\end{bmatrix}\begin{bmatrix} -1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 2 \end{bmatrix} \begin{bmatrix} 0 \\0 \\ 1 \end{bmatrix}\alpha \\ & = \alpha \begin{bmatrix}0 & 0 & 1\end{bmatrix} \begin{bmatrix} 0 \\ 0 \\ 2 \end{bmatrix} \alpha \\ & = 2\alpha^2 \end{aligned}
Observing that this product is always greater then zero for \alpha \neq 0, then it means that L is positive defined in the kernel of the active constraints and so
\bm{x}^\star = (1,1,0) \qquad \textrm{is a minimum point}
Example 12 (problem from an exam) Given the problem
\begin{aligned} \textrm{minimize:} \qquad & f(x,y,z) = x^2-y+z \\ \textrm{subject to:} \qquad & h(x,y,z) = y-x = 1 \\ & g_1(x,y,z) = x\geq 1 \\ & g_2 (x,y,z) = z \geq 1 \end{aligned}
the associated lagrangian is
\begin{aligned} \mathcal{L}(x,y,z,\lambda,\mu_1,\mu_2) & = x^2-y+z - \lambda(y-x-1) \\ & - \mu_1 (x-1) - \mu_2(z-1) \end{aligned}
The non-linear system associated to the the first order necessary KKT condition is so
\begin{cases} 2x + \lambda - \mu_1 = 0 & : \partial/\partial_x \mathcal{L}\\ -1 - \lambda = 0 &: \partial/\partial_y \mathcal{L} \\ 1 - \mu_2 = 0 & : \partial/\partial_z \mathcal{L} \\ y-x-1 = 0 & : h \\ x-1 \geq 0 & : g_1 \\ z-1 \geq 0 & : g_2 \\ \mu_1(x-1) = 0 & : \mu_1 g_1 \\ \mu_2(z-1) = 0 & : \mu_2 g_2 \\ \mu_1,\mu_2 \geq 0 \end{cases}
The lonely solution of this system is the one
\begin{aligned} x^\star &= 1 \\ y^\star &= 2 \\ z^\star &=1 \\ \lambda^\star &= - 1 \\ \mu_1^\star &= \mu_2^\star &= 1 \end{aligned}
For such point the set of active constraint \mathcal{A} is determined by all the constraints \{h,g_1,g_2\} (in fact g_1(\bm{x}^\star) = g_2(\bm{x}^\star) = 0) whose gradient is so
\bm{H} = \begin{bmatrix} \nabla h \\ \nabla g_1 \\ \nabla g_2 \end{bmatrix} = \begin{bmatrix} -1 & 1 & 0 \\ 1 & 0 & 0 \\ 0 & 0 & 1 \end{bmatrix}
This is a 3\times 3 matrix with linearly independent columns/rows and so the kernel obtained by solving \bm{H}\bm{z} = \bm{0} consist only in the null vector \bm{z} = \bm{0}. Determined the hessian
\bm{L} = \nabla_{\bm{x}} ^2 \mathcal{L} = \begin{bmatrix} 2 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{bmatrix}
we have that \bm{z}^t \bm{L} \bm{z} is always zero in the kernel of the active constraints, meaning that the necessary condition is satisfied, but not the sufficient and so we are in the grey zone on which the point might (or not) be a local minima.