Minimization of a Functional with path constraints

Authors

Affiliation

Enrico Bertolazzi

University of Trento, Department of Industrial Engineering

Matteo Dalle Vedove

University of Trento, Department of Industrial Engineering

Minimization with Algebraic (Path) Constraints

We consider a variational problem in which we seek to minimize an objective functional with respect to a path constraint—an algebraic constraint that must hold continuously along the entire path of the function x(t).

Problem Formulation

The problem can be stated as follows:

\begin{aligned} \text{minimize} \quad & \mathcal{F}(x) = \int_a^b L(x, x^\prime, t) \, \mathrm{d}t, \\ \text{subject to} \quad & g(x, x^\prime, t) = 0. \end{aligned}

where L(x, x^\prime, t) is the Lagrangian, which depends on x(t), its derivative x^\prime(t), and t; and g(x, x^\prime, t) = 0 represents the algebraic (path) constraint that must be satisfied at each point t \in [a, b].

Defining the Feasible Space

We define a feasible space \mathbb{W} for x, which includes all functions that satisfy both the path constraint g(x, x^\prime, t) = 0 and the boundary conditions of the problem:

\mathbb{W} = \left\{ x \in C^1([a, b]) \mid g(x, x^\prime, t) = 0, \; t \in [a, b] \right\}.

The goal, then, is to find an x \in \mathbb{W} that minimizes \mathcal{F}(x).

Feasible Directions and Conditions

To analyze feasible directions, we introduce a perturbation of the form x + \alpha \delta x, where \delta x is a variation such that:

g(x(t), x^\prime(t), t) = 0, \quad \text{for all } t \in [a, b].

To obtain the variation of \mathcal{F}(x), we compute:

\begin{aligned} \delta \mathcal{F}(x; \delta x) &= \int_a^b \left( \frac{\partial L}{\partial x}(x, x^\prime, t) \delta x + \frac{\partial L}{\partial x^\prime}(x, x^\prime, t) \delta x^\prime \right) \, \mathrm{d}t \\ &= \int_a^b \left( \frac{\partial L}{\partial x}(x, x^\prime, t) - \frac{\mathrm{d}}{\mathrm{d}t} \left( \frac{\partial L}{\partial x^\prime}(x, x^\prime, t) \right) \right) \delta x \, \mathrm{d}t \\ &\quad + \left[ \frac{\partial L}{\partial x^\prime}(x, x^\prime, t) \delta x \right]_{t = a}^{t = b}. \end{aligned}

However, for \delta x to be feasible, it must also satisfy the constraint:

\delta g(x, x^\prime, t) = 0, \quad \Rightarrow \quad \frac{\partial g}{\partial x}(x, x^\prime, t) \delta x + \frac{\partial g}{\partial x^\prime}(x, x^\prime, t) \delta x^\prime = 0.

this approach is impratical, thus, we try to use finite difference.

Discretization with Finite Differences

To discretize the problem, we divide the interval [a, b] into n subintervals of width h = \frac{b - a}{n} and define points t_k = a + k h for k = 0, 1, \dots, n. Let x_k approximate the solution at each point t_k, so that x_k \approx x(t_k).

First Derivative Approximation: We approximate the first derivative x^\prime(t) at each point t_k using a finite difference formula:

x^\prime(t_{k+1/2}) = \frac{x(t_{k+1}) - x(t_k)}{h} + \mathcal{O}(h^2)

(check using Taylor series)
Midpoint Approximation: We approximate the midipoint x(t_{k+1/2}) using left and rigth values:

x(t_{k+1/2}) = \frac{x(t_{k+1})+x(t_k)}{2} + \mathcal{O}(h^2)

(check using Taylor series)
Applying the Constraint: We impose the path constraint g(x, x^\prime, t) = 0 at each (mid-)point t_{k+1/2} by substituting the finite difference approximation:

g\left( \frac{x_{k+1}+x_k}{2}, \frac{x_{k+1} - x_k}{h}, \frac{t_{k+1}+t_k}{2}\right) = 0.

This transforms the continuous path constraint into a set of discrete constraints, one at each point t_{k+1/2}.

Discrete Formulation of the Objective Functional

To approximate the integral \mathcal{F}(x), we use the midpoint rule:

\begin{aligned} \mathcal{F}_h(\bm{x}) &= h \sum_{k=0}^{n-1} L\left(x_{k+1/2},x^\prime_{k+1/2}, t_{k+1/2}\right), \\[1em] x_{k+1/2}&=\frac{x_{k+1}+x_k}{2},\quad x^\prime_{k+1/2}=\frac{x_{k+1}-x_k}{h},\quad t_{k+1/2}=\frac{t_{k+1}+t_k}{2}, \end{aligned}

Our goal is now to minimize \mathcal{F}_h(\bm{x}) with respect to \bm{x} = (x_0, x_1, \dots, x_n), subject to the discrete constraints on g(x_{k+1/2}, x^\prime_{k+1/2}, t_{k+1/2}) = 0 for each k.

Lagrange Multiplier Method for Discrete Problem

We introduce Lagrange multipliers \lambda_k at each point t_k to incorporate the constraints. The discrete Lagrangian becomes:

\begin{aligned} \mathcal{L}_h(\bm{x}, \bm{\lambda}) &= \mathcal{F}_h(\bm{x}) - \sum_{k=0}^{n-1} \lambda_{k} \cdot g\left(x_{k+1/2}, x^\prime_{k+1/2}, t_{k+1/2}\right). \\ & = h\sum_{k=0}^{n-1} L\left(x_{k+1/2}, x^\prime_{k+1/2}, t_{k+1/2}\right) - \sum_{k=0}^{n-1} \lambda_{k} \cdot g\left(x_{k+1/2}, x_{k+1/2}, t_{k+1/2}\right) \\ & = h\sum_{k=0}^{n-1} M\left(x_{k+1/2}, x^\prime_{k+1/2}, \lambda_{k+1/2}, t_{k+1/2}\right) \end{aligned}

where

M(x,x^\prime,\lambda,t) = L(x,x^\prime,t)-\lambda g(x,x^\prime,t)

The nonlinear system

To find stationary points of \mathcal{L}_h(\bm{x}, \bm{\lambda}), we differentiate with respect to each x_k and \lambda_k, leading to a system of equations:

Derivative with respect to x_k:

\frac{\partial \mathcal{L}_h(\bm{x}, \bm{\lambda})}{\partial x_k} = 0,\qquad k=0,1,\ldots,n
Constraint equations for each k:

g\left(x_{k+1/2}, x^\prime_{k+1/2}, t_{k+1/2}\right) = 0,\qquad k=0,1,\ldots,n-1

in details:

\begin{aligned} \frac{\partial \mathcal{L}_h(\bm{x}, \bm{\lambda})}{\partial x_0} & = h\frac{\partial}{\partial x_0} M\left(x_{1/2}, x^\prime_{1/2}, \lambda_{1/2}, t_{1/2}\right) \\ & = \frac{h}{2}\frac{\partial M}{\partial x} \left(x_{1/2}, x^\prime_{1/2}, \lambda_{1/2}, t_{1/2}\right) -\frac{\partial M}{\partial x^\prime} \left(x_{1/2}, x^\prime_{1/2}, \lambda_{1/2}, t_{1/2}\right) \\ \frac{\partial \mathcal{L}_h(\bm{x}, \bm{\lambda})}{\partial x_n} & = h\frac{\partial}{\partial x_n} M\left(x_{n-1/2}, x^\prime_{n-1/2}, \lambda_{n-1/2}, t_{n-1/2}\right) \\ & = \frac{h}{2}\frac{\partial M}{\partial x} \left(x_{n-1/2}, x^\prime_{n-1/2}, \lambda_{n-1/2}, t_{n-1/2}\right) +\frac{\partial M}{\partial x^\prime} \left(x_{n-1/2}, x^\prime_{n-1/2}, \lambda_{n-1/2}, t_{n-1/2}\right) \end{aligned}

for k=1,2,\ldots n-1 \begin{aligned} \frac{\partial \mathcal{L}_h(\bm{x}, \bm{\lambda})}{\partial x_k} & = h\frac{\partial}{\partial x_k} M\left(x_{k-1/2}, x^\prime_{k-1/2}, \lambda_{k-1/2}, t_{k-1/2}\right) + h\frac{\partial}{\partial x_k} M\left(x_{k+1/2}, x^\prime_{k+1/2}, \lambda_{k+1/2}, t_{k+1/2}\right) \\ & = \frac{h}{2}\bigg[ \frac{\partial M}{\partial x} \left(x_{k-1/2}, x^\prime_{k-1/2}, \lambda_{k-1/2}, t_{k-1/2}\right)+ \frac{\partial M}{\partial x} \left(x_{k+1/2}, x^\prime_{k+1/2}, \lambda_{k+1/2}, t_{k+1/2}\right) \bigg] \\ & + \bigg[ \frac{\partial M}{\partial x^\prime} \left(x_{k-1/2}, x^\prime_{k-1/2}, \lambda_{k-1/2}, t_{k-1/2}\right)- \frac{\partial M}{\partial x^\prime} \left(x_{k+1/2}, x^\prime_{k+1/2}, \lambda_{k+1/2}, t_{k+1/2}\right) \bigg] \end{aligned}

consider now the discrete nonlinear system

\left\{ \begin{aligned} \dfrac{1}{h} \frac{\partial \mathcal{L}_h(\bm{x}, \bm{\lambda})}{\partial x_k}&=0,\qquad &k&=1,2,\ldots,n-1 \\ \frac{\partial \mathcal{L}_h(\bm{x}, \bm{\lambda})}{\partial x_0}&=0 \\ \frac{\partial \mathcal{L}_h(\bm{x}, \bm{\lambda})}{\partial x_n}&=0 \\ g\left(x_{k+1/2}, x^\prime_{k+1/2}, t_{k+1/2}\right) &= 0,\qquad & k&=0,1,\ldots,n-1 \end{aligned} \right.

As h \to 0, this finite difference approximation corresponds to the following approxinations

\begin{aligned} \dfrac{1}{h} \frac{\partial \mathcal{L}_h(\bm{x}, \bm{\lambda})}{\partial x_k} & \approx \left[ \frac{\partial M(x,x^\prime,\lambda,t)}{\partial x} +\dfrac{\mathrm{d}}{\mathrm{d}t} \frac{\partial M(x,x^\prime,\lambda,t)}{\partial x^\prime} \right]_{t=t_k} \\ \frac{\partial \mathcal{L}_h(\bm{x}, \bm{\lambda})}{\partial x_0} & \approx -\frac{\partial M}{\partial x^\prime} \left(x(a), x^\prime(a), \lambda(a), a\right) \\ \frac{\partial \mathcal{L}_h(\bm{x}, \bm{\lambda})}{\partial x_n} & \approx \frac{\partial M}{\partial x^\prime} \left(x(b), x^\prime(b), \lambda(b), b\right) \\ g\left(x_{k+1/2}, x^\prime_{k+1/2}, t_{k+1/2}\right) &\approx g\left(x, x^\prime, t\right)\Big|_{t=t_k} \end{aligned}

thus the finite differencre approximation can be viewed as the discretization of the following BVP

\left\{ \begin{aligned} \frac{\partial M(x,x^\prime,\lambda,t)}{\partial x} +\dfrac{\mathrm{d}}{\mathrm{d}t} \frac{\partial M(x,x^\prime,\lambda,t)}{\partial x^\prime} &=0 \\ -\frac{\partial M}{\partial x^\prime} \left(x(a), x^\prime(a), \lambda(a), a\right) &=0\\ \frac{\partial M}{\partial x^\prime} \left(x(b), x^\prime(b), \lambda(b), b\right) &=0 \\ g\left(x, x^\prime, t\right) &=0 \end{aligned} \right.

Analogy usage

In this section, we develop an analogy for handling constrained variational problems, where we aim to minimize an objective functional subject to an algebraic (or path) constraint. This technique introduces an augmented functional by incorporating a Lagrange multiplier to enforce the constraint continuously over the interval.

Problem Statement

We start with the following constrained minimization problem: \begin{aligned} \text{minimize} \quad & \mathcal{F}(x) = \int_a^b L(x, x^\prime, t) \, \mathrm{d}t, \\ \text{subject to} \quad & g(x, x^\prime, t) = 0. \end{aligned} Where:

\mathcal{F}(x) is the objective functional, with L(x,x',t) as the Lagrangian depending on the function x(t), its derivative x^\prime(t), and the independent variable t.
The path constraint g(x,x',t) = 0 must hold continuously over the interval [a,b].

Formulating the Augmented Functional

To incorporate the constraint into the minimization process, we construct an augmented functional L(x, \lambda) by introducing a Lagrange multiplier function \lambda(t) that enforces the constraint g(x, x', t) = 0 at every point in the interval:

\mathcal{L}(x,\lambda) = %\int_a^b L(x, x^\prime, t)-\lambda g(x, x^\prime, t) \, \mathrm{d}t = \int_a^b M(x, x^\prime, \lambda, t)\, \mathrm{d}t

where we define the augmented Lagrangian (or the modified integrand) as:

M(x, x', \lambda, t) = L(x, x', t) - \lambda g(x, x', t).

Deriving the Necessary Conditions via First Variation

To find stationary points of \mathcal{L}(x, \lambda), we calculate the first variation \delta \mathcal{L}(x, \lambda; \delta x, \delta \lambda) with respect to x and \lambda:

\delta\mathcal{L}(x,\lambda;\delta x,\delta\lambda) = \int_a^b \frac{\partial M}{\partial x}\delta x+ \frac{\partial M}{\partial x^\prime}\delta x^\prime + \frac{\partial M}{\partial\lambda}\delta \lambda \, \mathrm{d}t

To further simplify, we integrate by parts on the term involving \delta x^\prime:

\delta\mathcal{L}(x,\lambda;\delta x,\delta\lambda) = \int_a^b \left[ \frac{\partial M}{\partial x} -\dfrac{\mathrm{d}}{\mathrm{d}t} \frac{\partial M}{\partial x^\prime} \right]\delta x+ \frac{\partial M}{\partial\lambda}\delta \lambda \, \mathrm{d}t + \frac{\partial M}{\partial x^\prime}\delta x\Big|_{t=a}^{t=b}

Applying the Fundamental Lemma of Calculus of Variations

To ensure that \delta \mathcal{L}(x, \lambda; \delta x, \delta \lambda) = 0 for arbitrary variations \delta x and \delta \lambda, we apply the Fundamental Lemma of Calculus of Variations. This yields the following system of Euler-Lagrange equations:

\left\{ \begin{aligned} \frac{\partial M}{\partial x}(x, x^\prime, \lambda,t) -\dfrac{\mathrm{d}}{\mathrm{d}t} \frac{\partial M}{\partial x^\prime}(x, x^\prime, \lambda, t) &=0 \\ \frac{\partial M}{\partial x^\prime}(x(b), x^\prime(tb), \lambda(b), b) &=0 \\ -\frac{\partial M}{\partial x^\prime}(x(a), x^\prime(a), \lambda(a), a) &=0 \\ \frac{\partial M}{\partial\lambda}(x, x^\prime, \lambda, t) = g(x, x^\prime, \lambda, t) &= 0 \end{aligned} \right.

that is the previous result obtained with finite difference.

References

Betts, John T. 2010. Practical Methods for Optimal Control Using Nonlinear Programming. 3rd ed. Society for Industrial; Applied Mathematics.

Bryson, Arthur E., and Yu-Chi Ho. 1975. Applied Optimal Control: Optimization, Estimation, and Control. Wiley.

Kirk, Donald E. 2004. Optimal Control Theory: An Introduction. Dover Publications.

Liberzon, Daniel. 2012. Calculus of Variations and Optimal Control Theory: A Concise Introduction. Princeton University Press.