Exercize on KKT

Authors

Affiliation

Enrico Bertolazzi

University of Trento, Department of Industrial Engineering

Matteo Dalle Vedove

University of Trento, Department of Industrial Engineering

Example 1: Detailed Example of Constrained Minimization

In this example, we will minimize the function

\begin{aligned} \textrm{minimize:}\quad & f(x, y) = x + y, \\ \textrm{subject to the constraint:}\quad & h(x, y) = x^2 + y^2 - 1 = 0, \end{aligned}

where h(x, y) = 0 describes a unit circle. We will approach this problem using two methods:

Parametrization with x = \cos(t) and y = \sin(t).
Lagrange Multipliers.

Parametric Solution

The constraint h(x, y) = x^2 + y^2 - 1 represents a unit circle. We can parametrize the constraint as:

x(t) = \cos(t), \quad y(t) = \sin(t), \quad t \in [0, 2\pi].

Substituting the parametrization into the objective function yields:

f(\cos(t), \sin(t)) = \cos(t) + \sin(t).

To find the critical points, we can differentiate this parametric function and analyze it:

Critical Points

The maximum and minimum values of f can be found by solving for t:

The function f(\cos(t), \sin(t)) achieves its extrema when:

\frac{\mathrm{d}}{\mathrm{d}t} (\cos(t) + \sin(t)) = -\sin(t) + \cos(t) = 0.

This gives us:

\sin(t) = \cos(t) \quad \Rightarrow \quad t = \frac{\pi}{4} + k\pi, \, k \in \mathbb{Z}.

Function Values

Evaluating the function at these points:

At t = \dfrac{5\pi}{4} (minimum):

\left. \begin{aligned} x &= \cos\left(\frac{5\pi}{4}\right) = -\frac{\sqrt{2}}{2}, \\ y &= \sin\left(\frac{5\pi}{4}\right) = -\frac{\sqrt{2}}{2}, \end{aligned} \right\} \quad \implies\quad f(x, y) = -\sqrt{2}.

At t = \dfrac{\pi}{4} (maximum):

\left. \begin{aligned} x &= \cos\left(\frac{\pi}{4}\right) = \frac{\sqrt{2}}{2}, \\ y &= \sin\left(\frac{\pi}{4}\right) = \frac{\sqrt{2}}{2}, \end{aligned} \right\} \quad\implies\quad f(x, y) = \sqrt{2}.

Visualization

Let’s visualize the unit circle and the level sets of the objective function.

Code

# R Code to plot the level set, constraint, normals, and gradient field

# Generate data for the unit circle
theta <- seq(0, 2 * pi, length.out = 100)
x_circle <- cos(theta)
y_circle <- sin(theta)

# Create a coarser grid for the level set and fewer arrows
x_range <- seq(-1.5, 1.5, length.out = 10)  # Fewer grid points for arrows
y_range <- seq(-1.5, 1.5, length.out = 10)
grid <- expand.grid(x = x_range, y = y_range)

# Calculate f(x,y) for level sets
grid$f <- with(grid, x + y)

# Gradient of f(x, y) = x + y is constant: [1, 1]
grid$fx <- 1  # Gradient in x direction
grid$fy <- 1  # Gradient in y direction

# Normal vectors for the circle at selected points
circle_normals <- data.frame(
  x = cos(seq(0, 2 * pi, length.out = 10)),
  y = sin(seq(0, 2 * pi, length.out = 10))
) %>%
  mutate(nx = x, ny = y)  # Normals are the same as the (x, y) on the unit circle

# Plot the constraint, level sets, gradient field, and normals
ggplot() +
  # Circle with border
  geom_polygon(aes(x = x_circle, y = y_circle), fill = "lightblue", color = "black", alpha = 0.5) +

  # Contour lines for f(x, y) = x + y
  geom_contour(data = grid, aes(x = x, y = y, z = f), bins = 10, color = "blue") +

  # Gradient field for f(x, y) = x + y with light grey arrows
  geom_segment(data = grid, aes(x = x, y = y, xend = x + 0.15*fx, yend = y + 0.15*fy),
               arrow = arrow(length = unit(0.2, "cm")), color = "orange") +

  # Normals to the circle
  geom_segment(data = circle_normals, aes(x = x, y = y, xend = x + 0.2*nx, yend = y + 0.2*ny),
               arrow = arrow(length = unit(0.2, "cm")), color = "darkgreen") +

  # Minimum and maximum points
  geom_point(aes(x = -sqrt(2)/2, y = -sqrt(2)/2), color = "red", size = 3) +  # Minimum point
  geom_point(aes(x = sqrt(2)/2, y = sqrt(2)/2), color = "green", size = 3) +  # Maximum point

  # Labels and limits
  labs(title = TeX("gradient for $f(x,y)=x+y$ and constraint $h(x,y)=x^2+y^2-1$"),
       x = "x", y = "y") +
  # Center title and subtitle
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5, size = 14),  # Centered and slightly larger title
  ) +
  xlim(-1.5, 1.5) + ylim(-1.5, 1.5) +
  coord_fixed() +  # Fix the aspect ratio to 1:1
  theme_minimal()

Figure 1: The plot illustrates that the minimum and maximum points occur where the gradient vector field of the function f(x, y) = x + y is aligned with the normal vectors of the constraint curve h(x, y) = x^2 + y^2-1. This alignment indicates that the gradient is parallel to the constraint’s normal, satisfying the first order necessary condition.

Lagrange Multipliers Method

Now, we will apply the method of Lagrange multipliers to our constrained optimization problem. We define the Lagrangian as follows:

\mathcal{L}(x, y, \lambda) = f(x, y) - \lambda h(x, y) = x + y - \lambda (x^2 + y^2 - 1).

First-Order Conditions

To find the extrema, we take the gradients of the Lagrangian with respect to x, y, and \lambda. This leads to the first-order conditions:

For x: \nabla_x \mathcal{L} = 1 - 2\lambda x = 0 \quad \Rightarrow \quad \lambda = \frac{1}{2x}.
For y: \nabla_y \mathcal{L} = 1 - 2\lambda y = 0 \quad \Rightarrow \quad \lambda = \frac{1}{2y}.
The constraint condition: h(x, y) = x^2 + y^2 - 1 = 0.

Solving the nonlinear system

Setting the two expressions for \lambda equal gives us:

\frac{1}{2x} = \frac{1}{2y} \quad \Rightarrow \quad x = y.

Next, substituting x = y into the constraint:

2x^2 = 1 \quad \Rightarrow \quad x = \pm \frac{1}{\sqrt{2}} \quad \Rightarrow \quad y = \pm \frac{1}{\sqrt{2}}.

Second-Order Conditions

To determine whether the critical points are minima or maxima, we need to evaluate the second-order conditions.

Second-Order Necessary Condition: The Hessian of the Lagrangian, \nabla_{(x,y)}^2 \mathcal{L}(x, y, \lambda), must be semi-positive definite in the null space of the constraint gradients.
Second-Order Sufficient Condition: The Hessian of the Lagrangian, \nabla_{(x,y)}^2 \mathcal{L}(x, y, \lambda), must be positive definite in the null space of the constraint gradients.

Computing the Hessian

The Hessian matrix of the Lagrangian with respect to x and y is given by:

\nabla^2 \mathcal{L}(x, y, \lambda) = \begin{bmatrix} \dfrac{\partial^2 \mathcal{L}}{\partial x^2} & \dfrac{\partial^2 \mathcal{L}}{\partial x \partial y} \\[1em] \dfrac{\partial^2 \mathcal{L}}{\partial y \partial x} & \dfrac{\partial^2 \mathcal{L}}{\partial y^2} \end{bmatrix} = \begin{bmatrix} -2\lambda & 0 \\ 0 & -2\lambda \end{bmatrix}.

Kernel of the Constraint

The kernel of the constraint

h(x, y) = 0

consists of the vectors \bm{z} that satisfy

\nabla h(x, y) \cdot \bm{z} = 0.

The gradient of the constraint is:

\nabla h(x, y) = [2x, 2y]

and it is easy to verify that the kernel of the constraint gradient can be spanned by vector \bm{z} = \begin{pmatrix} -y \\ x \end{pmatrix}

Check qualification

Gradient of active constraint \nabla h(x, y) consist of a single nonzero row. Thus, is full rank and satify LI (linear indipendence) qualification.

Evaluating the Hessian at Critical Points

At point (x^\star,y^\star)=\left(-\frac{1}{\sqrt{2}}, -\frac{1}{\sqrt{2}}\right) the multiplier \lambda = -\frac{1}{\sqrt{2}}. The kernel of h(x,y) is spanned by \bm{z} = t \begin{pmatrix} y^\star\\-x^\star\end{pmatrix} = \frac{t}{\sqrt{2}} \begin{pmatrix} -1\\1\end{pmatrix} \in\ker(\nabla (x^\star,y^\star))

Thus the Hessian and second order condition becomes: \begin{aligned} \nabla^2 \mathcal{L} & = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}, \qquad \bm{z}^2 \nabla^2 \mathcal{L} \bm{z}\quad\implies \\[1em] & \frac{t^2}{2} \begin{bmatrix} 1 & -1 \end{bmatrix} \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} \begin{bmatrix} 1 \\ -1 \end{bmatrix} =t^2 > 0, \end{aligned} for all t\neq 0. Therefore, this point is a local minimum.
At point (x^\star,y^\star)=\left(\frac{1}{\sqrt{2}},\frac{1}{\sqrt{2}}\right) the multiplier \lambda = \frac{1}{\sqrt{2}}.

Thus the Hessian and second order condition becomes: \begin{aligned} \nabla^2 \mathcal{L} & = \begin{bmatrix} -1 & 0 \\ 0 & -1 \end{bmatrix}, \qquad \bm{z}^2 \nabla^2 \mathcal{L} \bm{z}\quad\implies \\[1em] & \frac{t^2}{2} \begin{bmatrix} 1 & -1 \end{bmatrix} \begin{bmatrix} -1 & 0 \\ 0 & -1 \end{bmatrix} \begin{bmatrix} 1 \\ -1 \end{bmatrix} =-t^2 < 0,\qquad \forall t\neq 0 \end{aligned} Therefore, this point is a local maximum.

Conclusion

Both methods yield the same results:

The minimum occurs at \left(-\frac{1}{\sqrt{2}}, -\frac{1}{\sqrt{2}}\right) with f(x, y) = -\sqrt{2}.
The maximum occurs at \left(\frac{1}{\sqrt{2}}, \frac{1}{\sqrt{2}}\right) with f(x, y) = \sqrt{2}.

Example 2: Detailed example of KKT

In this example, we will minimize the function

\begin{aligned} \textrm{minimize:}\quad & f(x, y, z ) = x^2 + y + z, \\ \textrm{equality constraint:}\quad & x+y^2+z^2=1, \\ \textrm{and inequalities:}\quad & z\leq 1, \quad x \geq 0, \end{aligned}

Rewriting in Standard Form

We can rewrite the problem in a more standard form:

\begin{aligned} \textrm{minimize:}\quad & f(x, y, z ) = x^2 + y + z, \\ \textrm{equality constraint:}\quad & h(x,y,z) = x+y^2+z^2-1 = 0, \\ \textrm{and inequalities:}\quad & g_1(x,y,z) = 1-z\geq 0, \\ & g_2(x,y,z) = x \geq 0, \end{aligned}

Here, h(x, y, z) = 0 describes a unit circle. We will now solve this problem using the KKT (Karush-Kuhn-Tucker) conditions.

Lagrangian Function

We define the Lagrangian function as:

\mathcal{L}(x,y,z,\lambda,\mu_1,\mu_2)= f(x,y,z) - \lambda h(x,y,z) - \mu_1 g_1(x,y,z) - \mu_2 g_2(x,y,z)

Substituting the expressions for f, h, g_1, and g_2:

\mathcal{L}(x,y,z,\lambda,\mu_1,\mu_2) = x^2 + y + z - \lambda(x+y^2+z^2-1) - \mu_1( 1-z) - \mu_2x

First-Order Conditions

To satisfy the first-order conditions, we compute the partial derivatives of \mathcal{L}:

\left\{\begin{aligned} \dfrac{\partial\mathcal{L}}{\partial x} &= 2x-\lambda-\mu_2 = 0\\ \dfrac{\partial\mathcal{L}}{\partial y} &= 1-2\lambda y = 0\\ \dfrac{\partial\mathcal{L}}{\partial x} &= 1-2\lambda z+\mu_1 = 0 \end{aligned}\right.

The KKT conditions also require the original constraints to hold: \left\{\begin{aligned} x+y^2+z^2-1 & = 0\\ \mu_1(1-z) & = 0\\ \mu_2 x & = 0 \end{aligned}\right.

We now proceed with different cases based on the values of \mu_1 and \mu_2

Case 1: \mu_1=\mu_2=0

In this case, the first-order conditions simplify to:

\left\{\begin{aligned} & 2x-\lambda = 0\\ & 1-2\lambda y = 0\\ & 1-2\lambda z = 0\\ & x+y^2+z^2-1 = 0\\ \end{aligned}\right. \implies \left\{\begin{aligned} &x=\lambda/2\\ &y=1/(2\lambda)\\ &z=1/(2\lambda)\\ & \lambda/2+2/(2\lambda)^2-1 = 0 \end{aligned}\right. \implies \lambda^3-2\lambda^2+1=0 solving the cubic (with Maple) the real roots are \lambda = 1, \; \dfrac{1+\sqrt{5}}{2}, \; \dfrac{1-\sqrt{5}}{2}

From these values, we compute the corresponding x, y, and z:

\mu_1=\mu_0=0,\quad \left\{\begin{aligned} &x=y=z=1/2, \quad& \lambda&=1 \\ &x = (1+\sqrt{5})/4,\quad& y &= z = 1/(\sqrt{5}+1), \quad \lambda = (\sqrt{5}+1)/2 \\ &{\color{red}x = (1-\sqrt{5})/4<0},\quad& y &= z = 1/(1-\sqrt{5}), \quad \lambda = (1-\sqrt{5})/2 \end{aligned}\right. the last solution must be discarded
Case 2: \mu_1=0, and \mu_2>0

In this case, x = 0, and the system becomes:

\left\{\begin{aligned} &2x-\lambda-\mu_2 = 0\\ &1-2\lambda y = 0\\ &1-2\lambda z = 0 \\ &x+y^2+z^2-1 = 0\\ &x = 0 \end{aligned}\right. \implies \left\{\begin{aligned} &\lambda=-\mu_2\\ &y=1/(2\lambda)=-1/(2\mu_2)\\ &z=1/(2\lambda)=-1/(2\mu_2)\\ &2/(2\mu_2)^2-1 = 0\\ &x = 0 \end{aligned}\right. \implies \mu_2^2=\dfrac{1}{2} the positive root is \mu_2=1/\sqrt{2} so that x=0, \quad y = z = -\dfrac{1}{2\mu_2} = -\dfrac{\sqrt{2}}{2},\quad \lambda=-\dfrac{1}{\sqrt{2}}=-\dfrac{\sqrt{2}}{2},\quad \mu_1=0,\quad \mu_2=\dfrac{1}{\sqrt{2}}
Case 3: \mu_2=0, and \mu_1>0

\left\{\begin{aligned} & 2x-\lambda = 0\\ & 1-2\lambda y = 0\\ & 1-2\lambda z+\mu_1 = 0 \\ &x+y^2+z^2-1= 0\\ &z=1 \end{aligned}\right. \implies \left\{\begin{aligned} & 2x-\lambda = 0\\ & 1-2\lambda y = 0\\ & 1-2\lambda +\mu_1 = 0 \\ &x+y^2= 0\\ &z=1 \end{aligned}\right. \implies \left\{\begin{aligned} & x=\lambda/2\\ & y=1/(2\lambda)\\ & \mu_1 = 2\lambda-1 \\ &x+y^2= 0\\ &z=1 \end{aligned}\right.

so that the real root of \lambda is

\dfrac{\lambda}{2}+\dfrac{1}{4\lambda^2}=0\;\implies\; 2\lambda^3+1=0\;\implies\; \lambda=-2^{-1/3}\;\implies\;

and

{\color{red}x = -2^{-4/3}< 0},\quad y=2^{-2/3}, \quad z=1, \quad \lambda=-2^{-1/3}, \quad \mu_1 = 2^{2/3}-1>0, \quad \mu_2=0

the solutoion must be discarded.
Case 4: \mu_2>0 and \mu_1>0

\left\{\begin{aligned} & 2x-\lambda-\mu_2 = 0\\ & 1-2\lambda y = 0\\ & 1-2\lambda z+\mu_1 = 0\\ &x+y^2+z^2-1 = 0\\ &z=1\\ &x=0 \end{aligned}\right. \implies \left\{\begin{aligned} & -\lambda-\mu_2 = 0\\ & 1-2\lambda y = 0\\ & 1-2\lambda +\mu_1 = 0\\ &y^2 = 0\\ &z=1\\ &x=0 \end{aligned}\right. \implies 1-2\lambda \cdot 0 = 0, \quad \textrm{no solution}

All the solutions

\bm{x}	\bm{y}	\bm{z}	\bm{\lambda}	\bm{\mu_2}
\dfrac{1}{2}	\dfrac{1}{2}	\dfrac{1}{2}	1	0
\dfrac{1+\sqrt{5}}{4}	\dfrac{1}{\sqrt{5}+1}	\dfrac{1}{\sqrt{5}+1}	\dfrac{\sqrt{5}+1}{2}	0
0	-\dfrac{\sqrt{2}}{2}	-\dfrac{\sqrt{2}}{2}	-\dfrac{\sqrt{2}}{2}	\dfrac{\sqrt{2}}{2}

Study of solution 1: x=y=z=\dfrac{1}{2}, \lambda=1, \mu_1=\mu_2=0

Compute active constraints

\begin{aligned} g_1(x,y,z) & = 1-z = 1-\dfrac{1}{2}=\dfrac{1}{2} > 0, \qquad & \textrm{\color{DarkGreen}(not-active)} \\ g_2(x,y,z) & = x = \dfrac{1}{2}> 0, \qquad & \textrm{\color{DarkGreen}(not-active)} \end{aligned}

Gradient of active constraints

\nabla h(x,y,z) = \nabla (x+y^2+z^2-1) = [1,2y,2z] = [1,1,1]

find the kernel of the active constraint

[1,1,1]\begin{pmatrix} z_1\\z_2\\z_3\end{pmatrix} = z_1+z_2+z_3 \implies \begin{pmatrix} z_1\\z_2\\z_3\end{pmatrix} = \begin{pmatrix} -\alpha-\beta\\\alpha\\\beta\end{pmatrix} = \underbrace{ \begin{pmatrix} -1 & -1 \\ 1 & 0 \\ 0 & 1\end{pmatrix} }_{\bm{K}^a} \begin{pmatrix} \alpha \\ \beta\end{pmatrix}

Check qualification

Gradient of active constraint \nabla h(x, y, z) consist of a single nonzero row. Thus, is full rank and satify LI (linear indipendence) qualification.

Compute hessian and reduced hessian

\nabla^2_{(x,y,z)} = \begin{pmatrix} 2 & 0 & 0 \\ 0 & -2\lambda & 0 \\ 0 & 0 & -2\lambda \end{pmatrix} \implies \begin{pmatrix} 2 & 0 & 0 \\ 0 & -2 & 0 \\ 0 & 0 & -2 \end{pmatrix}

reduced hessian \begin{aligned} \bm{K}^{\top,a}\nabla^2_{(x,y,z)}\bm{K}^a &= \begin{pmatrix} -1 & 1 & 0 \\ -1 & 0 & 1 \end{pmatrix} \begin{pmatrix} 2 & 0 & 0 \\ 0 & -2 & 0 \\ 0 & 0 & -2 \end{pmatrix} \begin{pmatrix} -1 & -1 \\ 1 & 0 \\ 0 & 1\end{pmatrix} \\ &= \begin{pmatrix} -1 & 1 & 0 \\ -1 & 0 & 1 \end{pmatrix} \begin{pmatrix} -2 & -2 \\ -2 & 0 \\ 0 & -2\end{pmatrix} \\ &= \begin{pmatrix} 0 & 2 \\ 2 & 0 \end{pmatrix} \end{aligned}

Use Sylverster to check positivity

\bm{H}_1 = (0), determinant 0, check perturbed \det(0+\epsilon) = \epsilon and \det(0+\epsilon)^\prime = 1>0 [OK]
\det\begin{pmatrix} 0 & 2 \\ 2 & 0 \end{pmatrix}=-4<0 [NO]

the point is not a minimum.

Study of solution 2: x=\frac{\sqrt{5}+1}{4}, y=z=\frac{\sqrt{5}-1}{4}, \lambda=\frac{\sqrt{5}+1}{2}, \mu_1=\mu_2=0

Compute active constraints

\begin{aligned} g_1(x,y,z) & = 1-z = 1-\frac{\sqrt{5}-1}{4}=\frac{5-\sqrt{5}}{4}> 0, \qquad & \textrm{\color{DarkGreen}(not-active)} \\ g_2(x,y,z) & = x = \frac{\sqrt{5}+1}{4}> 0, \qquad & \textrm{\color{DarkGreen}(not-active)} \end{aligned}

Gradient of active constraints

\nabla h(x,y,z) = \begin{pmatrix}1 & 2y & 2z \end{pmatrix} = \begin{pmatrix} 1 & \dfrac{\sqrt{5}-1}{2} & \dfrac{\sqrt{5}-1}{2}\end{pmatrix}

find the kernel of the active constraint

\begin{aligned} \begin{pmatrix} 1 & \dfrac{\sqrt{5}-1}{2} & \dfrac{\sqrt{5}-1}{2}\end{pmatrix} \begin{pmatrix} z_1\\z_2\\z_3\end{pmatrix} & = z_1+\dfrac{\sqrt{5}-1}{2}(z_2+z_3) \\ & \Downarrow \\ \begin{pmatrix} z_1\\z_2\\z_3\end{pmatrix} & = \begin{pmatrix} \dfrac{1-\sqrt{5}}{2}(\alpha+\beta)\\ \alpha\\\beta\end{pmatrix} = \underbrace{ \begin{pmatrix} \dfrac{1-\sqrt{5}}{2} & \dfrac{1-\sqrt{5}}{2} \\ 1 & 0 \\ 0 & 1\end{pmatrix} }_{\bm{K}^a} \begin{pmatrix} \alpha \\ \beta\end{pmatrix} \end{aligned}

Check qualification

Gradient of active constraint \nabla h(x, y, z) consist of a single nonzero row. Thus, is full rank and satify LI (linear indipendence) qualification.

Compute hessian and reduced hessian

\nabla^2_{(x,y,z)} = \begin{pmatrix} 2 & 0 & 0 \\ 0 & -2\lambda & 0 \\ 0 & 0 & -2\lambda \end{pmatrix} \implies \begin{pmatrix} 2 & 0 & 0 \\ 0 & -(\sqrt{5}+1) & 0 \\ 0 & 0 & -(\sqrt{5}+1) \end{pmatrix}

reduced hessian \begin{aligned} \bm{K}^{\top,a}\nabla^2_{(x,y,z)}\bm{K}^a &= \begin{pmatrix} -1 & 1 & 0 \\ -1 & 0 & 1 \end{pmatrix} \begin{pmatrix} 2 & 0 & 0 \\ 0 & -(\sqrt{5}+1) & 0 \\ 0 & 0 & -(\sqrt{5}+1) \end{pmatrix} \begin{pmatrix} -1 & -1 \\ 1 & 0 \\ 0 & 1\end{pmatrix} \\ &= \begin{pmatrix} -1 & 1 & 0 \\ -1 & 0 & 1 \end{pmatrix} \begin{pmatrix} -2 & -2 \\ -(\sqrt{5}+1) & 0 \\ 0 & -(\sqrt{5}+1)\end{pmatrix} \\ &= \begin{pmatrix} 1-\sqrt{5} & 2 \\ 2 & 1-\sqrt{5} \end{pmatrix} \end{aligned}

Use Sylverster to check positivity

\bm{H}_1 = (01-\sqrt{5}), determinant 1-\sqrt{5}<0 [NO]

the point is not a minimum.

Study of solution 3: x=0, y=z=\lambda=-\frac{\sqrt{2}}{2}, \mu_1=0, \mu_2=\frac{\sqrt{2}}{2}

Compute active constraints

\begin{aligned} g_1(x,y,z) & = 1-z = 1+\frac{\sqrt{2}}{2}=\frac{2+\sqrt{2}}{2}> 0, \qquad & \textrm{\color{DarkGreen}(not-active)} \\ g_2(x,y,z) & = x = 0, \qquad & \textrm{\color{blue}(active)} \end{aligned}

Gradient of active constraints

\begin{pmatrix} \nabla h(x,y,z) \\ \nabla g_2(x,y,z) \end{pmatrix} = \begin{pmatrix} 1 & 2y & 2z \\ 1 & 0 & 0 \end{pmatrix} = \begin{pmatrix} 1 & -\sqrt{2} & -\sqrt{2} \\ 1 & 0 & 0 \end{pmatrix}

find the kernel of the active constraint

\begin{aligned} \begin{pmatrix} 1 & -\sqrt{2} & -\sqrt{2} \\ 1 & 0 & 0 \end{pmatrix} \begin{pmatrix} z_1\\z_2\\z_3\end{pmatrix} & = \begin{pmatrix} z_1-\sqrt{2}(z_2+z_3) \\ z_1 \end{pmatrix} \\ & \Downarrow \\ \begin{pmatrix} z_1\\z_2\\z_3\end{pmatrix} & = \begin{pmatrix}0\\\alpha\\-\alpha\end{pmatrix} = \underbrace{ \begin{pmatrix} 0 \\ 1 \\ -1 \end{pmatrix} }_{\bm{K}^a} \alpha \end{aligned} \tag{1}

Check qualification

Gradient of active constraint Equation 1 is full rank and satify LI (linear indipendence) qualification.

Compute hessian and reduced hessian

\nabla^2_{(x,y,z)} = \begin{pmatrix} 2 & 0 & 0 \\ 0 & -2\lambda & 0 \\ 0 & 0 & -2\lambda \end{pmatrix} \implies \begin{pmatrix} 2 & 0 & 0 \\ 0 & \sqrt{2} & 0 \\ 0 & 0 & \sqrt{2} \end{pmatrix}

reduced hessian \begin{aligned} \bm{K}^{\top,a}\nabla^2_{(x,y,z)}\bm{K}^a &= \begin{pmatrix} 0 & 1 & -1 \end{pmatrix} \begin{pmatrix} 2 & 0 & 0 \\ 0 & \sqrt{2} & 0 \\ 0 & 0 & \sqrt{2} \end{pmatrix} \begin{pmatrix} 0 \\ 1 \\ -1 \end{pmatrix} \\ &= \begin{pmatrix} 0 & 1 & -1 \end{pmatrix} \begin{pmatrix} 0 \\ \sqrt{2} \\ -\sqrt{2} \end{pmatrix} \\ &= 2\sqrt{2} > 0 \end{aligned}

Use Sylverster to check positivity

\bm{H}_1 = 2\sqrt{2} > 0, [OK]

the point is a minimum.

Example 3

Consider the problem

\begin{aligned} \textrm{minimize:}\quad & f(x, y) = y \\ \textrm{inequalities:}\quad & g_1(x,y,z) = x^2+y^2-1 \geq 0, \\ & g_2(x,y,z) = 1-x^2-\frac{1}{4}y^2 \geq 0, \end{aligned}

Code

# Define the functions for the implicit curves
g1 <- function(x, y) { x^2 + y^2 - 1 } # Circle: x^2 + y^2 = 1
g2 <- function(x, y) { 1 - x^2 - (y / 2)^2 } # Ellipse: x^2 + (y/2)^2 = 1

# Create a grid of points
x_vals <- seq(-2, 2, length.out = 300)
y_vals <- seq(-2, 2, length.out = 300)

# Generate the data for g1 and g2
z_g1 <- outer(x_vals, y_vals, g1)
z_g2 <- outer(x_vals, y_vals, g2)

# Create a data frame for plotting with ggplot2
df <- expand.grid(x = x_vals, y = y_vals)
df$g1 <- as.vector(z_g1)
df$g2 <- as.vector(z_g2)

# Define the grid for the gradient vectors with more points
vector_grid <- expand.grid(x = seq(-2, 2, length.out = 15),  # Increased grid size for more arrows
                           y = seq(-2, 2, length.out = 15))
# The gradient of f(x, y) = y is (0, 1) at every point
vector_grid$u <- 0  # x-component of the gradient
vector_grid$v <- 1  # y-component of the gradient

# Plot using ggplot2
ggplot(df, aes(x = x, y = y)) +
  # Fill the first curve (circle) with semi-transparent red
  geom_contour_filled(aes(z = g1), breaks = c(-Inf, 0), fill = "red", alpha = 0.5) +
  # Fill the second curve (ellipse) with semi-transparent blue
  geom_contour_filled(aes(z = g2), breaks = c(-Inf, 0), fill = "blue", alpha = 0.5) +
  # Add contour lines for both curves
  geom_contour(aes(z = g1), breaks = 0, color = "red", linewidth = 1) +
  geom_contour(aes(z = g2), breaks = 0, color = "blue", linewidth = 1) +
  # Add the gradient field with more arrows and darker color
  geom_segment(data = vector_grid,
               aes(x = x, y = y, xend = x + u * 0.2, yend = y + v * 0.2),
               arrow = arrow(length = unit(0.1, "inches")),
               color = "gray50",  # Darker color for arrows
               linewidth = 0.7) +  # Slightly larger arrow size
  # Add the point at (0, -2)
  geom_point(aes(x = 0, y = -2), color = "red", size = 3) +
  geom_point(aes(x = 0, y = 1), color = "blue", size = 3) +
  coord_fixed() +  # Fix the aspect ratio to constrain scaling
  theme_minimal() +
  labs(title = "Ellipses with Gradient Field of f(x, y) = y", x = "x", y = "y")

Figure 2: The white area is the admissible region of the problem. The red point is the constrained minima.

Lagrangian Function

We define the Lagrangian function as follows:

\mathcal{L}(x,y,\mu_1,\mu_2) = y - \mu_1(x^2 + y^2 - 1) - \mu_2\left(1 - x^2 - \frac{1}{4}y^2\right)

First-Order Conditions

The first-order conditions derived from the Lagrangian are:

\begin{aligned} \frac{\partial \mathcal{L}}{\partial x} &= 2x(\mu_2 - \mu_1), \\ \frac{\partial \mathcal{L}}{\partial y} &= 1 + y\left(\frac{1}{2}\mu_2 - 2\mu_1\right), \\ 0 &= \mu_1(x^2 + y^2 - 1), \\ 0 &= \mu_2\left(1 - x^2 - \frac{1}{4}y^2\right). \end{aligned}

Solutions of First-Order Conditions

The solutions to the first-order conditions yield the following critical points:

\begin{aligned} x &= 0, & y &= 1, & \mu_1 &= \frac{1}{2}, & \mu_2 &= 0, \\ x &= 0, & y &= -2, & \mu_1 &= 0, & \mu_2 &= 1. \end{aligned}

Analysis of First Candidate

Compute Active Constraints

We evaluate the constraints at the first candidate (x, y) = (0, 1):

\begin{aligned} g_1(x,y) &= x^2 + y^2 - 1 = 0^2 + 1^2 - 1 = 0, & \text{(active)} \\ g_2(x,y) &= 1 - x^2 - \frac{1}{4}y^2 = 1 - 0^2 - \frac{1}{4}1^2 = \frac{3}{4} > 0, & \text{(not-active)} \end{aligned}

Gradient of Active Constraints

The gradient of the active constraint g_1 is computed as follows:

\nabla g_1(x,y) = [2x, 2y] = [0, 2].

Check qualification

Gradient of active constraint \nabla g_1(x,y) is full rank and satify LI (linear indipendence) qualification.

Kernel of the Gradient of Active Constraints

The kernel of the gradient can be expressed as:

[0, 2] \begin{pmatrix} z_1 \\ z_2 \end{pmatrix} = 2z_2 = 0 \implies \begin{pmatrix} z_1 \\ z_2 \end{pmatrix} = \begin{pmatrix} \alpha \\ 0 \end{pmatrix} = \underbrace{\begin{pmatrix} 1 \\ 0 \end{pmatrix}}_{\bm{K}^a} \alpha.

Reduced Hessian

The reduced Hessian is given by:

\begin{aligned} \nabla^2_{(x,y)} \mathcal{L}(x,y,\mu_1,\mu_2) &= \begin{pmatrix} 2(\mu_2 - \mu_1) & 0 \\ 0 & \frac{1}{2}(\mu_2 - 4\mu_1) \end{pmatrix} \\ & \implies \begin{pmatrix} -1 & 0 \\ 0 & -1 \end{pmatrix}. \end{aligned}

Evaluating the quadratic form yields:

\begin{aligned} \bm{K}^{a,\top} \nabla^2_{(x,y)} \mathcal{L}(x,y,\mu_1,\mu_2) \bm{K}^a &= \begin{pmatrix} 1 & 0 \end{pmatrix} \begin{pmatrix} -1 & 0 \\ 0 & -1 \end{pmatrix} \begin{pmatrix} 1 \\ 0 \end{pmatrix} = -1. \end{aligned}

Since the Hessian is not semi-definite positive, this point is not a minimum.

Analysis of Second Candidate

Compute Active Constraints

Next, we evaluate the constraints at the second candidate (x, y) = (0, -2):

\begin{aligned} g_1(x,y) &= x^2 + y^2 - 1 = 0^2 + (-2)^2 - 1 = 3 > 0, & \text{(not-active)} \\ g_2(x,y) &= 1 - x^2 - \frac{1}{4}y^2 = 1 - 0^2 - \frac{1}{4}(-2)^2 = 0, & \text{(active)} \end{aligned}

Gradient of Active Constraints

The gradient of the active constraint g_2 is computed as:

\nabla g_2(x,y) = [-2x, -\frac{1}{2}y] = [0, -1].

Check qualification

Gradient of active constraint \nabla g_1(x,y) is full rank and satify LI (linear indipendence) qualification.

Kernel of the Gradient of Active Constraints

The kernel of the gradient can be expressed as:

[0, -1] \begin{pmatrix} z_1 \\ z_2 \end{pmatrix} = 2z_2 = 0 \implies \begin{pmatrix} z_1 \\ z_2 \end{pmatrix} = \begin{pmatrix} \alpha \\ 0 \end{pmatrix} = \underbrace{\begin{pmatrix} 1 \\ 0 \end{pmatrix}}_{\bm{K}^a} \alpha.

Reduced Hessian

The reduced Hessian is given by:

Evaluating the quadratic form yields:

\begin{aligned} \bm{K}^{a,\top} \nabla^2_{(x,y)} \mathcal{L}(x,y,\mu_1,\mu_2) \bm{K}^a &= \begin{pmatrix} 1 & 0 \end{pmatrix} \begin{pmatrix} 2 & 0 \\ 0 & \frac{1}{2} \end{pmatrix} \begin{pmatrix} 1 \\ 0 \end{pmatrix} = 1. \end{aligned}

Since the Hessian is definite positive, this point is a minimum.

Example 4

Consider the problem

\begin{aligned} \textrm{minimize:}\quad & f(x, y) = x \\ \textrm{inequalities:}\quad & g_1(x,y,z) = x^2+y^2-1 \geq 0, \\ & g_2(x,y,z) = 1-x^2-\frac{1}{4}y^2 \geq 0, \end{aligned}

Code

# Define the functions for the implicit curves
g1 <- function(x, y) { x^2 + y^2 - 1 } # Circle: x^2 + y^2 = 1
g2 <- function(x, y) { 1 - x^2 - (y / 2)^2 } # Ellipse: x^2 + (y/2)^2 = 1

# Create a grid of points
x_vals <- seq(-2, 2, length.out = 300)
y_vals <- seq(-2, 2, length.out = 300)

# Generate the data for g1 and g2
z_g1 <- outer(x_vals, y_vals, g1)
z_g2 <- outer(x_vals, y_vals, g2)

# Create a data frame for plotting with ggplot2
df <- expand.grid(x = x_vals, y = y_vals)
df$g1 <- as.vector(z_g1)
df$g2 <- as.vector(z_g2)

# Define the grid for the gradient vectors with more points
vector_grid <- expand.grid(x = seq(-2, 2, length.out = 15),  # Increased grid size for more arrows
                           y = seq(-2, 2, length.out = 15))
# The gradient of f(x, y) = x is (1, 0) at every point
vector_grid$u <- 1  # x-component of the gradient
vector_grid$v <- 0  # y-component of the gradient

# Plot using ggplot2
ggplot(df, aes(x = x, y = y)) +
  # Fill the first curve (circle) with semi-transparent red
  geom_contour_filled(aes(z = g1), breaks = c(-Inf, 0), fill = "red", alpha = 0.5) +
  # Fill the second curve (ellipse) with semi-transparent blue
  geom_contour_filled(aes(z = g2), breaks = c(-Inf, 0), fill = "blue", alpha = 0.5) +
  # Add contour lines for both curves
  geom_contour(aes(z = g1), breaks = 0, color = "red", linewidth = 1) +
  geom_contour(aes(z = g2), breaks = 0, color = "blue", linewidth = 1) +
  # Add the gradient field with more arrows and darker color
  geom_segment(data = vector_grid,
               aes(x = x, y = y, xend = x + u * 0.2, yend = y + v * 0.2),
               arrow = arrow(length = unit(0.1, "inches")),
               color = "gray50",  # Darker color for arrows
               linewidth = 0.7) +  # Slightly larger arrow size
  # Add the point at (0, -2)
  geom_point(aes(x = -1, y = 0), color = "red", size = 3) +
  geom_point(aes(x = 1, y = 0), color = "blue", size = 3) +
  coord_fixed() +  # Fix the aspect ratio to constrain scaling
  theme_minimal() +
  labs(title = "Ellipses with Gradient Field of f(x, y) = y", x = "x", y = "y")

Figure 3: The white area is the admissible region of the problem. The red point is the constrained minima.

Lagrangian Function

We define the Lagrangian function as follows:

\mathcal{L}(x,y,\mu_1,\mu_2) = x - \mu_1(x^2 + y^2 - 1) - \mu_2\left(1 - x^2 - \frac{1}{4}y^2\right)

First-Order Conditions

The first-order conditions derived from the Lagrangian are:

\begin{aligned} \frac{\partial \mathcal{L}}{\partial x} &= 1+2x(\mu_2 - \mu_1), \\ \frac{\partial \mathcal{L}}{\partial y} &= y\left(\frac{1}{2}\mu_2 - 2\mu_1\right), \\ 0 &= \mu_1(x^2 + y^2 - 1), \\ 0 &= \mu_2\left(1 - x^2 - \frac{1}{4}y^2\right). \end{aligned}

Solutions of First-Order Conditions

Case y=0 \begin{aligned} 0 &= 1+2x(\mu_2 - \mu_1), \\ 0 &= \mu_1(x^2 - 1), \\ 0 &= \mu_2(1 - x^2). \end{aligned}

in this case if \mu_1=\mu_2=0 the first equation fails. Thus x=1 or x=-1
- Case x=1 solving 1+2(\mu_2 - \mu_1)=0 we have \mu_1=1/2+\mu_2
- Case x=-1 solving 1-2(\mu_2 - \mu_1)=0 we have \mu_2=1/2+\mu_1
thus we have two family of solution
- x=1, y=0, \mu_1=1/2+\mu_2 with \mu_1\geq 0
- x=-1, y=0, \mu_2=1/2+\mu_1 with \mu_2\geq 0
Case y\neq 0 in this case \mu_2=4\mu_1 and \left\{ \begin{aligned} 0 &= 1+6x\mu_1, \\ 0 &= \mu_1(x^2 + y^2 - 1), \\ 0 &= 4\mu_1\left(1 - x^2 - \frac{1}{4}y^2\right). \end{aligned}\right. \implies \mu_1\neq 0\implies \left\{ \begin{aligned} 0 &= x^2 + y^2 - 1, \\ 0 &= 1 - x^2 - \frac{1}{4}y^2. \end{aligned}\right. but the last system has solutioon y=0 and x=\pm1. Thus the solution must be discarded.

The solutions to the first-order conditions yield the following critical points:

\begin{aligned} x &= 1, & y &= 0, & \mu_1 &= \frac{1}{2}+\mu_2, \\ x &= -1, & y &= 0, & \mu_2 &= \frac{1}{2}+\mu_1 \end{aligned}

Analysis of First Candidate

Compute Active Constraints

We evaluate the constraints at the first candidate (x, y) = (1, 0):

\begin{aligned} g_1(x,y) &= x^2 + y^2 - 1 = 1^2 + 0^2 - 1 = 0, & \text{(active)} \\ g_2(x,y) &= 1 - x^2 - \frac{1}{4}y^2 = 1 - 1^2 - \frac{1}{4}0^2 = 0, & \text{(active)} \end{aligned}

Gradient of Active Constraints

The gradient of the active constraint g_1 and g_2 is computed as follows:

\begin{pmatrix} \nabla g_1(x,y) \\ \nabla g_2(x,y) \end{pmatrix} = \begin{pmatrix} 2x & 2y \\ -2x & -y/2 \end{pmatrix} \;\implies\; \begin{pmatrix} 2 & 0 \\ -2 & 0 \end{pmatrix} \tag{2}

Check qualification

Gradient of active constraint Equation 2 is not full rank and do not satify LI (linear indipendence) qualification.

Remark 2. However, by choosing the solution with \mu_2 = 0, we can eliminate the second constraint and still obtain the same candidate point. Thus, by removing the first inequality, the result for this point remains unchanged, and the constraint qualification holds.

Kernel of the Gradient of Active Constraints

The kernel of the gradient can be expressed as:

\begin{aligned} \begin{pmatrix} 2 & 0 \\ -2 & 0 \end{pmatrix} \begin{pmatrix} z_1 \\ z_2 \end{pmatrix} & = \begin{pmatrix} 2z_1 \\ -2z_1 \end{pmatrix} = \begin{pmatrix} 0 \\ 0 \end{pmatrix} \\ & \Downarrow \\ \begin{pmatrix} z_1 \\ z_2 \end{pmatrix} & = \begin{pmatrix} 0 \\ \alpha \end{pmatrix} = \underbrace{\begin{pmatrix} 0 \\ 1 \end{pmatrix}}_{\bm{K}^a} \alpha. \end{aligned}

Reduced Hessian

The reduced Hessian (with \mu_2=0 and \mu_1=1/2) is given by:

Evaluating the quadratic form yields:

\begin{aligned} \bm{K}^{a,\top} \nabla^2_{(x,y)} \mathcal{L}(x,y,\mu_1,\mu_2) \bm{K}^a &= \begin{pmatrix} 0 & 1 \end{pmatrix} \begin{pmatrix} -1 & 0 \\ 0 & -1 \end{pmatrix} \begin{pmatrix} 0 \\ 1 \end{pmatrix} = \frac{1}{4}. \end{aligned}

Since the Hessian is not semi-definite positive, this point is not a minimum.

Analysis of Second Candidate

Compute Active Constraints

Next, we evaluate the constraints at the second candidate (x, y) = (-1,0):

\begin{aligned} g_1(x,y) &= x^2 + y^2 - 1 = (-1)^2 + 0^2 - 1 = 0, & \text{(active)} \\ g_2(x,y) &= 1 - x^2 - \frac{1}{4}y^2 = 1 - (-1)^2 - \frac{1}{4}(0)^2 = 0, & \text{(active)} \end{aligned}

Gradient of Active Constraints

The gradient of the active constraint g_1 and g_2 is computed as:

\begin{pmatrix} \nabla g_1(x,y) \\ \nabla g_2(x,y) \end{pmatrix} = \begin{pmatrix} 2x & 2y \\ -2x & -y/2 \end{pmatrix} \;\implies\; \begin{pmatrix} 2 & 0 \\ -2 & 0 \end{pmatrix}

Kernel of the Gradient of Active Constraints

The kernel of the gradient can be expressed as:

Check qualification

Gradient of active constraint Equation 3 is not full rank and do not satify LI (linear indipendence) qualification.

Remark 2. However, by choosing the solution with \mu_1 = 0, we can eliminate the first constraint and still obtain the same candidate point. Thus, by removing the first inequality, the result for this point remains unchanged, and the constraint qualification holds.

Reduced Hessian

The reduced Hessian (with \mu_1=0 and \mu_2=1/2) is given by: \begin{aligned} \nabla^2_{(x,y)} \mathcal{L}(x,y,\mu_1,\mu_2) &= \begin{pmatrix} 2(\mu_2 - \mu_1) & 0 \\ 0 & \frac{1}{2}(\mu_2 - 4\mu_1) \end{pmatrix} \\ & \implies \begin{pmatrix} 1 & 0 \\ 0 & \frac{1}{4} \end{pmatrix}. \end{aligned}

Evaluating the quadratic form yields:

\begin{aligned} \bm{K}^{a,\top} \nabla^2_{(x,y)} \mathcal{L}(x,y,\mu_1,\mu_2) \bm{K}^a &= \begin{pmatrix} 0 & 1 \end{pmatrix} \begin{pmatrix} 1 & 0 \\ 0 & \frac{1}{4} \end{pmatrix} \begin{pmatrix} 0 \\ 1 \end{pmatrix} = \frac{1}{4}. \end{aligned}

Since the Hessian is definite positive, this point is a minimum.

Example 5

Consider the problem

\begin{aligned} \textrm{minimize:}\quad & f(x,y) = 6 + 2y-x^2 \\ \textrm{equality:}\quad & h(x,y) = y - x^2 \end{aligned}

Lagrangian function

\mathcal{L}(x,y,\lambda) = 6 + 2y-x^2 - \lambda(y - x^2)

First-Order Conditions

\begin{aligned} \frac{\partial\mathcal{L}}{\partial x} & = 2x(\lambda-1) = 0\\ \frac{\partial\mathcal{L}}{\partial y} & = 2 - \lambda = 0\\ 0 & = y - x^2 \end{aligned}

Solution First-Order Conditions

\lambda = 2
x=0
y=0

Analisys of The Candidate

Gradient of the constraints

\nabla (y - x^2) = \begin{pmatrix} -2x & 1 \end{pmatrix},\quad \implies\quad \begin{pmatrix} 0 & 1 \end{pmatrix}

Check Qualification

Only one vector non zero is linearly independent. The constraint is qualified.

Kernel of the Gradient of Active Conbstraints

\begin{pmatrix} 0 & 1 \end{pmatrix} \begin{pmatrix} z_1 \\ z_2 \end{pmatrix} = z_2 = 0,\quad \implies\quad \begin{pmatrix} z_1 \\ z_2 \end{pmatrix} = \begin{pmatrix} \alpha \\ 0 \end{pmatrix} = \underbrace{\begin{pmatrix} 1 \\ 0 \end{pmatrix}}_{\bm{K}^a} \alpha

Reduced Hessian

\nabla_{(x,y)}\mathcal{L}(x,y,\lambda) = \begin{pmatrix} \lambda - 1 & 0 \\ 0 & 0 \end{pmatrix} \quad\implies\quad \begin{pmatrix} 1 & 0 \\ 0 & 0 \end{pmatrix}

\bm{K}^{a,\top} \nabla_{(x,y)}\mathcal{L}(0,0,2) \bm{K}^{a} = \begin{pmatrix} 1 & 0 \end{pmatrix} \begin{pmatrix} 1 & 0 \\ 0 & 0 \end{pmatrix} \begin{pmatrix} 1 \\ 0 \end{pmatrix} = (1)

the reduced Hessian is definite positive, the point is a minimum.

References

Bertsekas, Dimitri P. 2016. Nonlinear Programming. 3rd ed. Athena Scientific.

Fletcher, Roger. 2000. Practical Methods of Optimization. 2nd ed. John Wiley & Sons.

Nocedal, Jorge, and Stephen J. Wright. 2006. Numerical Optimization. 2nd ed. Springer Science & Business Media. https://doi.org/10.1007/978-0-387-40065-5.