Minimization of a Functional

Authors

Affiliation

Enrico Bertolazzi

University of Trento, Department of Industrial Engineering

Matteo Dalle Vedove

University of Trento, Department of Industrial Engineering

Introduction

A classical problem in minimization is the brachistrochrone, so the fastest ultimate curve (meaning determining the fastest trajectory to move from a point A to a point B considering that the only action is the one of the gravity).

Code

# Parameters for the cycloid (optimal curve)
r <- 1
theta <- seq(0, pi, length.out = 500)

# Parametric coordinates of the cycloid
x_cycloid <- r * (theta - sin(theta))
y_cycloid <- -r * (1 - cos(theta))

# Coordinates of the cycloid's final point
x_end <- max(x_cycloid)
y_end <- min(y_cycloid)

# Straight line between (0,0) and (x_end, y_end)
x_line <- seq(0, x_end, length.out = 500)
y_line <- seq(0, y_end, length.out = 500)

# Parabola passing through (0,0) and (x_end, y_end)
a_parabola <- y_end / (x_end^2)  # Coefficient for the parabola
x_parabola <- seq(0, x_end, length.out = 500)
y_parabola <- a_parabola * x_parabola^2  # Correct parabola equation

# Quadratic Bézier curve passing through (0,0) and (x_end, y_end)
# We choose a control point at 1.5 times the height of the endpoint
control_x <- x_end / 2
control_y <- 1.5 * y_end

# Parameters for the Bézier curve
t <- seq(0, 1, length.out = 500)
x_bezier <- (1 - t)^2 * 0 + 2 * (1 - t) * t * control_x + t^2 * x_end
y_bezier <- (1 - t)^2 * 0 + 2 * (1 - t) * t * control_y + t^2 * y_end

# Create data frames for ggplot
data_cycloid <- data.frame(x = x_cycloid, y = y_cycloid, curve = "Cycloid (optimal)")
data_line <- data.frame(x = x_line, y = y_line, curve = "Straight line")
data_parabola <- data.frame(x = x_parabola, y = y_parabola, curve = "Parabola")
data_bezier <- data.frame(x = x_bezier, y = y_bezier, curve = "Bézier spline")

# Combine all data into a single data frame
data_all <- rbind(data_cycloid, data_line, data_parabola, data_bezier)

# Create the plot
ggplot(data_all, aes(x = x, y = y, color = curve)) +
  geom_path(linewidth = 1.2) +
  labs(title = "Brachistochrone Problem: Curve Comparison",
       subtitle = "Cycloid (optimal curve) vs Straight Line vs Parabola vs Bézier Spline",
       x = "x-axis",
       y = "y-axis") +
  theme_minimal() +
  scale_color_manual(values = c("Cycloid (optimal)" = "blue",
                                "Straight line" = "red",
                                "Parabola" = "green",
                                "Bézier spline" = "purple")) +
  coord_fixed() +  # Maintains correct aspect ratio
# Add points using annotate
  annotate("point", x = 0, y = 0, color = "black", size = 4) +  # Start point
  annotate("point", x = x_end, y = y_end, color = "black", size = 4)  # End point

Figure 1: example of trajectories that an object can follow for moving from point A to B.

Mathematically the problem is determining the curve \mathcal C: [a,b] \to\Bbb{R} such that \mathcal C(a) = y_a and \mathcal C(b) = y_b that minimize the function T(\mathcal{C}) representing the time to travel and so

\textrm{minimize } T(\mathcal{C}) \qquad \textrm{for all possible curves } \mathcal{C}

We can see that a function takes a number as input and returns a number, like f(x) = x \mathrm{e}^x. A functional is instead something that takes as input a function and returns a number and an example is

\mathcal{F}(x) = \int_a^b x(t)\,\mathrm{d}t

where x is a function. Considering for example x(t) = t^2 the previous functional becomes

\mathcal{F}(x) = \int_a^b t^2\,\mathrm{d}t = \frac{t^3}{3}\Big|_a^b= \frac{b^3-a^3}{3}

Another example of functional can be

\mathcal{G}(z) = z'(0) \int_0^1 z^2(t)\,\mathrm{d}t

and this expression can be evaluated for every generic function z(t).

The question of this problem is how to define a minimum for a functional; in order to do so we have to firstly understand how to minimize a simple function and in particular given a function

f: A\subseteq\Bbb{R}\mapsto\Bbb{R}

has a minimum in the point x^\star if

f(x^\star) \leq f(x) \qquad \forall x \in A

Similarly given a function \mathcal{F}(x) a point x^\star (that in reality is a function) is a minimum if

\mathcal{F}(x^\star) \leq \mathcal{F}(x) \qquad \forall x \in ?

In this case we have to specify the class (functional space) of functions we are considering, as example x can be a function that’s continuous in the domain [a,b] and so we can define

x\in C([a,b]) = \big\{ g:[a,b]\mapsto R \ | \ g \textrm{ is continuous} \big\}

In general changing the domain of the functional \mathcal{F} may change the problem.

Example 1 (domain change) Considering the function f(x) = x^2+2, it’s roots can be computed if we allow complex solutions z\in\Bbb{C} (and in fact z = \pm \mathrm{i}\sqrt 2), while if the domain of the solution is the real set z\in\Bbb{R} no solutions exists.

Considering now the functional \mathcal{F} defined as

\mathcal{F}(x) = \int_{-1}^1 \big(x(t) - |t|\big)^2\, \mathrm{d}t

The |t| introduce a cuspid in t=0 that cannot be derived. If we minimize the functional in the continuous interval x\in C([-1,1]), the solution is the function x^\star(t) = |t|, in fact

\mathcal{F}(x^\star) = \int_{-1}^1 \big(|t|-|t|\big)^2 \, \mathrm{d}t = \int_{-1}^1 0\, \mathrm{d}t = 0

Choosing any other continuous function will result in a functional with a positive value.

Considering now to minimize the function in the domain of functions with continuous derivatives, and so x\in C^1([-1,1]). In this case |t|\notin C^1([-1,1]) (due to the cuspid). An approximation of the function |t| that is continuous with also continuous first derivative is the function

x_\varepsilon(t) = \begin{cases} t \qquad & t \geq \varepsilon \\ \frac 1 2 \left(\frac{t^2}\varepsilon + \varepsilon\right) \qquad & -\varepsilon < t< \varepsilon \\ -t & t \leq- \varepsilon \end{cases}

By pushing the limit \varepsilon\to 0 we can have the function that minimize the functional \mathcal{F} in the domain C^1([-1,1]). In fact we can see that the functional of x_\varepsilon becomes

\mathcal{F}(x_\varepsilon) = \int_{-\varepsilon}^\varepsilon \left( \frac{t^2}{2\varepsilon} + \frac \varepsilon 2 - |t| \right)^2\, \mathrm{d}t = \frac{\varepsilon^3}{10}

Some examples of space of functions

Functions f:[a,b]\mapsto\Bbb{R} continuous
Functions f:[a,b]\mapsto\Bbb{R} which are C^k (continuous with derivative up to order k).
Functions f:[a,b]\mapsto\Bbb{R} piecewise continuos

Less trivial space of functions examples

Space L_1([a,b]) of functions f:[a,b]\mapsto\Bbb{R} which are module integrable

\int_a^b | f(x) |\,\mathrm{d}t < \infty

[the integral must exists and is finite]
Space L_2([a,b]) of functions f:[a,b]\mapsto\Bbb{R} which are square integrable

\int_a^b | f(x) |^2\,\mathrm{d}t < \infty

[the integral must exists and is finite]

Example 2 Let f:[0,1]\mapsto\Bbb{R}, f(x)=1/x is not L_1([0,1]) in fact

\int_0^1 \dfrac{1}{x}\mathrm{d}x \to \infty

The function g:[0,1]\mapsto\Bbb{R}, g(x)=1/\sqrt{x}, is in L_1([0,1]) in fact

\int_0^1 \dfrac{1}{\sqrt{x}}\mathrm{d}x = \left[2\sqrt{|x|}\right]_{x=0}^{x=1} =2<\infty

however this function g(x) is not in L_2([0,1]), in fact

\int_0^1 g(x)^2\mathrm{d}x \int_0^1 \dfrac{1}{(\sqrt{x})^2}\mathrm{d}x= \int_0^1 \dfrac{1}{|x|}\mathrm{d}x \to \infty

the function h(x)=x^{-1/4} is in L_2([0,1]), in fact

\int_0^1 h(x)^2\mathrm{d}x= \int_0^1 (x^{-1/4})2\mathrm{d}x= \int_0^1 \dfrac{1}{\sqrt{x}}\mathrm{d}x = \left[2\sqrt{|x|}\right]_{x=0}^{x=1} =2<\infty

Analogies with linear algebra

To solve the problem of minimizing a functional we can see some relations with the linear algebra. The domain of the functional can be in fact see as a functional space \Bbb{V} analogous to the vectorial one which the vectors are represented by the functions and the scalars are represented by real values. With this definition example of functional spaces might be

\begin{aligned} \Bbb{V}_1 & = \big\{ f : [0,1] \mapsto\Bbb{R}\textrm{ continuous}\big\} \\ \Bbb{V}_2 & = \big\{ f: [a,b]\mapsto\Bbb{R}\textrm{ such that } f\in C^k([a,b]) \big\} \end{aligned}

with a,b \in \Bbb{R} and k\in \Bbb{N}.

Given in fact two function f,g \in \Bbb{V} that are member of the same functional space, we can see that each linear combination of the functions determine a function that’s still in the vectorial space:

\alpha f(x) + \beta f(x) \in \Bbb{V} \qquad \forall \alpha,\beta \in \Bbb{R}, \ f,g\in\Bbb{V}

As example let’s consider the two continuous function f(x) = x^2 and g(x) = \sin x, then we can clearly see that the function 2f(x) + \frac 13 g(x) = 2x^2 +\frac 13\sin x is still continuous. This in general means that the functional space is respect to the operations of function summation and multiplication by a scalar.

Scalar product

In linear algebra given two vector \bm{v}_1, \bm{v}_2 it exists the bilinear operator scalar product \langle \bm{v}_1,\bm{v}_2 \rangle = \bm{v}_1 \cdot \bm{v}_2 that satisfy the following rules:

\langle \alpha \bm{v} + \beta \bm{w}, \bm{z}\rangle = \langle \bm{z} , \alpha \bm{v} + \beta \bm{w}\rangle = \alpha \langle \bm{v},\bm{z}\rangle + \beta \langle \bm{w},\bm{z}\rangle

for all \alpha,\beta \in \Bbb{R}, \bm{v},\bm{w},\bm{z} \in \Bbb{V} and

\langle \bm{v},\bm{v}\rangle \geq 0 \quad \textrm{ and } \quad \langle \bm{v},\bm{v} \rangle = 0 \quad \Leftrightarrow \quad \bm{v} = \bm{0}

Also in the functional space \Bbb{V} can exists definitions of product scalar such the one here presented:

\langle f,g\rangle = \int_0^1 f(x)g(x)\, \mathrm{d}x \tag{1}

Note

This is not the lonely function that can serve as product scalar of function and in this case the relation holds for the functional space

\Bbb{V} = \{ f:[0,1] \mapsto \Bbb{R} \textrm{ integrable} \}

In this case we can prove that this definitions meets the requirements stated for the scalar product considering the linear properties of the integrals as follows:

\begin{aligned} \langle \alpha f(\cdot) + \beta g(\cdot),h \rangle & = \int_0^1 \big(\alpha f(x) + \beta g(x)\big)h(x) \, \mathrm{d}x \\ & = \alpha \int_0^1 f(x)h(g) \, \mathrm{d}x + \beta \int_0^2 g(x)h(x) \, \mathrm{d}x \\ & = \alpha \langle f,h \rangle + \beta \langle g,h \rangle \end{aligned}

and also

\langle f,f \rangle = \int_0^1 f(x) f(x) \, \mathrm{d}x = \int_0^1 f^2(x) \geq 0

and in particular we can see that the scalar product \langle f,f\rangle will give as result 0 if and only if the function f is identically null, so such that f(x)=0 for all x in it’s domain (in this case [0,1]).

Norm

In linear algebra it’s also defined the norm of a vector as

\|\bm{v}\| := \sqrt{\bm{v} \cdot \bm{v}}

and this expression is used to compute, from a vector, a single positive value (and in particular \|\bm{v}\| = 0 if and only if \bm{v} = \bm{0}). This definition also holds for the functional space and considering the scalar product defined (as example) in equation Equation 1 we can see that one definition of norm for function can be the one

\|f(x)\| = \sqrt{\langle f(x),f(x)\rangle} = \sqrt{\int_0^1f^2(x) \, \mathrm{d}x}

In this case we can clearly see that the norm \|f\| \geq 0 is always positive defined (and is zero only in the case on which the function f is identically null).

Example 3 (space of function) A space of functions can ve the one f:[a,b]\mapsto \Bbb{R} such that f(x) is continuous, or for example f:[a,b]\mapsto \Bbb{R} where f\in C^k([a,b]) (for k\in \Bbb{N}). The same can be said for piecewise continuous functions.

Less trivially is a space of function the set of f:[a,b]\mapsto \Bbb{R} that are module integrable, and so all the function f such that

\int_a^b |f(x)|\, \mathrm{d}x < \infty

In general we define as L^p([a,b]) the space of p-integrable functions, so such that

f\in L^p([a,b]) \qquad \iff \qquad \int_a^b |f(x)|^p\, \mathrm{d}x < \infty

First variation

Given a functional \mathcal{J}: \Bbb{V}\mapsto \Bbb{R} defined in the function space \Bbb{V}, in order to determine the first necessary condition for minimum of the function we have to define the concept of derivative for functionals that is called first variation (or directional derivative).

To determine the first variation of the functional \mathcal{J} respect to the function x\in\Bbb{V} we have to define to compute the functional for the function x(\cdot) + \alpha \eta(\cdot), where \eta\in\Bbb{V} (and can be regarded as the direction of the derivative) and \alpha \in \Bbb{R}. We can now denote the first variation as \delta \mathcal J\big|_x:\Bbb{V}\mapsto \Bbb{R} as the function that satisfy the following relation:

\mathcal{J}\big(x(\cdot) + \alpha \eta(\cdot)\big) = \mathcal{J}(x) + \delta \mathcal{J}(x;\eta)\alpha + o(\alpha)

By using the definition of the small-o it’s possible to express the limit relation that determines the first variation of the function as

\delta \mathcal{J}(x,\eta) = \lim_{\alpha\to 0} \frac{\mathcal J\big(x + \alpha \eta\big) - \mathcal J\big(x\big)}{\alpha}

By so defining the function

g(\alpha) = \mathcal{J}\big(x(\cdot)+ \alpha \eta(\cdot) \big)

then it means that the first variation of the functional \mathcal{J}(x) respect to the function x(t) can be computed as

\delta \mathcal{J}(x;\eta) = g'(0) \tag{2}

Example 4 (directional derivative of a functional — first variation) Given the functional

\mathcal{F}(x) = \int_0^1 \big(x^2(t) + 1\big) \, \mathrm{d}t \ + x(1)

it’s first variation respect to a generic direction d(t) can be calculated by firstly determining the associated function g:\Bbb{R}\mapsto \Bbb{R} defined as

\begin{aligned} g(\alpha) & = \mathcal{F}\big(x(\cdot) + \alpha\, d(\cdot)\big) \\ & = \int_0^1 \big(x(t) + \alpha\, d(t)\big)^2 + 1\, \mathrm{d}t + x(1) + \alpha\, d(1) \end{aligned}

As reported in equation Equation 2, the first variation of the functional \mathcal{F}(x) can be regarded as the derivative of g(\alpha) respect to \alpha evaluated for \alpha = 0; the first step is so determining

\begin{aligned} g'(\alpha ) &= \frac{\mathrm{d}}{\mathrm{d}\alpha}g(\alpha) \\ &= \frac{\mathrm{d}}{\mathrm{d}\alpha } \int_0^1 \big(x(t) + \alpha\, d(t)\big)^2 + 1\, \mathrm{d}t \\ &+ \frac{\mathrm{d}}{\mathrm{d}\alpha} \big(x(1) + \alpha\, d(1)\big) \end{aligned}

Assuming that \frac{\mathrm{d}}{\mathrm{d}\alpha}\int = \int \frac{\mathrm{d}}{\mathrm{d}\alpha} (operation that cannot always be performed) then the derivative of g(\alpha) can be regarded as

\begin{aligned} g'(\alpha) &= \int_0^1 \frac{\mathrm{d}}{\mathrm{d}\alpha} \Big(x(t) + \alpha\, d(t)\Big)^2\, \mathrm{d}t + \frac{\mathrm{d}}{\mathrm{d}\alpha} \Big(x(1) + \alpha \, d(1)\Big) \\ & = \int_0^1 2\big(x(t) + \alpha\, d(t)\big) d(t)\, \mathrm{d}t + d(1) \end{aligned}

Evaluating this expression for \alpha = 0 then the first variation of \mathcal{F}(x) with direction d(t) becomes

\delta \mathcal{F}(x;d) = g'(0) = \int_0^1 2xd\, \mathrm{d}t + d(1)

Integral and limits

In the computation of the variations we use the facti that

\dfrac{\mathrm{d}}{\mathrm{d}\alpha}\circ\int_a^b \quad = \quad \int_a^b \circ \dfrac{\mathrm{d}}{\mathrm{d}\alpha}

however this property is not valid in general, we require some regularity in the functions involved. To understand better we consider a simpler problem. The derivative is a limit of a ratio so that we wonder if it is alwais true that

\lim\circ\int \quad = \quad \int \circ \lim

in particular given a sequence of functions f_n(x), n=1,2,\ldots with the limit

f(x) = \lim_{n\to\infty} f_n(x),\qquad \forall x\in\Bbb{R}

we want to check if

\lim_{n\to\infty}\int_{-\infty}^{\infty} f_n(t)\,\mathrm{d}t \stackrel{?}{=} \int_{-\infty}^{\infty}\lim_{n\to\infty} f_n(t)\,\mathrm{d}t= \int_{-\infty}^{\infty}f(t)\,\mathrm{d}t

consider the functions

f_n(x) = \begin{cases} 0 & x \leq 0 \\ nx & 0 < x \leq \frac{1}{n} \\ 2-nx & \frac{1}{n} < x \leq \frac{2}{n} \\ 0 & x > \frac{2}{n} \end{cases}

Code

# Load required libraries
library(ggplot2)

# Define function f_n(x) for a given n
f_n <- function(x, n) {
  ifelse(x <= 0, 0,
         ifelse(x <= 1/n, n * x,
                ifelse(x <= 2/n, 2 - n * x, 0)))
}

# Create a data frame for plotting f_n and g_n for various values of n
plot_data <- data.frame()
x_vals <- seq(-0.25, 2, length.out = 1000)  # Define x-range for plot

# Add data for several values of n
for (n in c(1, 5, 10, 20, 25)) {
  plot_data <- rbind(
    plot_data,
    data.frame(
      x = x_vals,
      y = sapply(x_vals, f_n, n = n),
      func = paste("n =", n)
    )
  )
}

# Plot the functions f_n(x) for different values of n
ggplot(plot_data, aes(x = x, y = y, color = func)) +
  geom_line(linewidth = 1) +
  labs(title = TeX("Plots of Functions $f_n(x)$ for Various $n$"), x = "x", y = "Function Value") +
  theme_minimal() +
  theme(legend.position = "right") +
  scale_color_manual(
    values = scales::hue_pal()(length(unique(plot_data$func))),
    labels = lapply(unique(plot_data$func), TeX)  # Apply LaTeX formatting to legend labels
  )

it is clear that

\lim_{n\to\infty} f_n(x) = 0,\qquad \forall x\in\Bbb{R}

and

\begin{aligned} \int_{-\infty}^\infty f_n(x)\, \mathrm{d}x &= \int_{0}^{1/n} nx \,\mathrm{d}x+ \int_{1/n}^{2/n} 2-nx \,\mathrm{d}x \\ & = \left[n\frac{x^2}{2}\right]_{x=0}^{x=1/n}+ \left[2x-n\frac{x^2}{2}\right]_{x=1/n}^{x=2/n} \\ & = \left[ n\frac{(1/n)^2}{2}-n\frac{(0)^2}{2}\right]+ \left[ 2\frac{2}{n}-n\frac{(2/n)^2}{2} -2\frac{1}{n}+n\frac{(1/n)^2}{2} \right] \\ & = \frac{1}{2n}+ \frac{4}{n}-\frac{2}{n} -\frac{2}{n}+\frac{1}{2n}=\frac{1}{n} \end{aligned}

and thus

\begin{aligned} \lim_{n\to\infty}\int_{-\infty}^\infty f_n(x) \mathrm{d}x & = \lim_{n\to\infty}\frac{1}{n} = 0 \\ \int_{-\infty}^\infty \lim_{n\to\infty}f_n(x) \mathrm{d}x & = \int_{-\infty}^\infty 0 \mathrm{d}x = 0 \end{aligned}

and in this case limit and integral can be exchanged.

consider now the functions

g_n(x) = \begin{cases} 0 & x \leq 0 \\ n^2x & 0 < x \leq \frac{1}{n} \\ 2n-n^2x & \frac{1}{n} < x \leq \frac{2}{n} \\ 0 & x > \frac{2}{n} \end{cases}

Code

# Load required libraries
library(ggplot2)

# Define function g_n(x) for a given n
g_n <- function(x, n) {
  ifelse(x <= 0, 0,
         ifelse(x <= 1/n, n^2 * x,
                ifelse(x <= 2/n, 2 * n - n^2 * x, 0)))
}

# Create a data frame for plotting f_n and g_n for various values of n
plot_data <- data.frame()
x_vals <- seq(-0.25, 2, length.out = 1000)  # Define x-range for plot

# Add data for several values of n
for (n in c(1, 5, 10, 20, 25)) {
  plot_data <- rbind(plot_data,
                     data.frame(x = x_vals, y = sapply(x_vals, g_n, n = n),
                                func = paste("n =", n)))
}

# Plot the function g_n(x) for different values of n
ggplot(plot_data, aes(x = x, y = y, color = func)) +
  geom_line(linewidth = 1) +
  labs(title = TeX("Plots of Function $g_n(x)$ for Various $n$"),
       x = "x", y = "Function Value") +
  theme_minimal() +
  theme(legend.position = "right") +
  scale_color_manual(
    values = scales::hue_pal()(length(unique(plot_data$func))),
    labels = lapply(unique(plot_data$func), TeX)  # Apply LaTeX formatting to legend labels
  )

it is clear that

\lim_{n\to\infty} g_n(x) = 0,\qquad \forall x\in\Bbb{R}

and

\begin{aligned} \int_{-\infty}^\infty g_n(x)\,\mathrm{d}x &= \int_{0}^{1/n} n^2x\,\mathrm{d}x+ \int_{1/n}^{2/n} 2n-n^2x\,\mathrm{d}x \\ & = \frac{1}{2}+\frac{1}{2} = 1 \end{aligned}

and thus

\begin{aligned} \lim_{n\to\infty}\int_{-\infty}^\infty g_n(x) \mathrm{d}x & = \lim_{n\to\infty}1 = 1 \\ \int_{-\infty}^\infty \lim_{n\to\infty}g_n(x) \mathrm{d}x & = \int_{-\infty}^\infty 0 \mathrm{d}x = 0 \end{aligned}

and in this case limit and integral cannot be exchanged.

Thus in general, when limits, derivative and integral are exhanged we shuold check if the opearation is possibile.

Optimality condition

Considering the minimization of a a function f:A\subseteq \Bbb{R}^n\mapsto \Bbb{R}, it has been shown that the first order necessary condition for a point \bm{x}^\star \in A to be a minimum point is that it’s gradient \nabla f(\bm{x}^\star) must be null. Similarly in calculus of variation it’s proven that the first order necessary condition for the optimality of the solution is that the first variation of the functional \mathcal{J} must be

\delta \mathcal{J}(x;\eta) = 0

for every admissible perturbation \eta(t).

In particular we define a perturbation \eta \in\Bbb{V} admissible for the functional \mathcal{J}\in A\subseteq\Bbb{V} respect to the function y^\star(t) if it happens that y^\star(\cdot) + \alpha \eta(\cdot) \in A for all value \alpha sufficiently close to 0.

Fundamental lemma of the calculus of variations

Lemma 1 (Fundamental lemma of the calculus of variations) Given a function

f:[a,b]\mapsto \Bbb{R}, \qquad\textrm{(piecewise continuous)}

such that the integral satisfy \int_a^b f(x)\,g(x) \, \mathrm{d}x = 0 \tag{3}

for all g:[a,b] \mapsto \Bbb{R} smooth, i.e. g \in C^\infty([a,b]), with g^{(k)}(a) = g^{(k)}(b) = 0, \quad\forall k=0,1,2,3,\ldots

then f(x) is identically null.

Sign permanence

In order to later proof the fundamental lemma yet described, we have to remark the sign permanence that states:

Lemma 2 Given a function f:[a,b]\mapsto \Bbb{R} continuous such that f(c) >0, then there exists a \delta >0 such that

f(x) \geq \frac{f(c)}{2} \qquad \forall x \in [c-\delta,c+\delta]

Proof. The proof of this lemma can be done defining the parameter \epsilon = f(c) / 2; based on the assumption that f(x) is continuous, then exists a \delta>0 such that: |f(x) - f(c)| \leq \varepsilon = \frac{f(c)}{2} \qquad \forall |c-x|\leq \delta

Expanding the module operation the inequality becomes:

\begin{aligned} & -\frac{f(c)}{2} = - \epsilon \leq f(x)-f(c) \leq \epsilon = \frac{f(c)}{2} \\ &\implies \quad \underbrace{-\frac{f(c)}{2} + f(c) \leq f(x)}_{\frac{f(c)}{2} \leq f(x)} \leq f(c) + \frac{f(c)}{2} \end{aligned}

and so we prove the lemma of sign permanence. \square

Proof of the fundamental lemma

The proof of the fundamental lemma of the calculus of variation can be done by contradiction.

Let consider the integral Equation 3 and for the given function f, exists a point c such that f(c) > 0 (in general the proof can be done for f(c)\neq 0). From the sign permanence lemma we can state that exists an interval [c-\delta,c+\delta] such that f(x) \geq \frac{f(c)}{2 } \geq 0. Determining the function g as

g(x) = \begin{cases} \dfrac{x-(c-\delta)}{\delta} \qquad & x\in [c-\delta, c] \\ \dfrac{c-x+\delta}{\delta} & x\in[c,c+\delta] \\ 0 & \textrm{otherwise} \end{cases}

Code

g <- function(x, c, delta) {
  ifelse(x >= (c - delta) & x <= c,
         (x - (c - delta)) / delta,
         ifelse(x > c & x <= (c + delta),
                (c - x + delta) / delta,
                0))
}

# Grafico della funzione g
plot_g <- function(c, delta) {
  # Creare un vettore di valori x
  x_values <- seq(c - delta - 1, c + delta + 1, by = 0.01)

  # Valuta g per ogni x
  y_values <- g(x_values, c, delta)

  plot(x_values, y_values, type = "l", col = "blue",
       main = expression(g(x)),
       xlab = "x", ylab = expression(g(x)),
       ylim = c(0, 1), lwd = 2)

  # linee orizzontali e verticali
  abline(h = seq(0, 1, by = 0.2), col = "lightgray", lty = "dotted")
  abline(v = c(c - delta, c, c + delta), col = "red", lty = "dotted")
}

c <- 0
delta <- 1
plot_g(c, delta)

Figure 4: Example of function g(x) with c=0 and \delta=1.

Note that this function is continuous but g\notin C^\infty (later will be described a function that presents this feature). With this definition the integral \int_a^b fg\, \mathrm{d}x becomes

\int_{c-\delta}^{c+\delta} f(x)g(x)\, \mathrm{d}x \geq \frac{f(c)}{2} \int_{c-\delta}^{c+\delta} g(x)\, \mathrm{d}x \geq \frac{f(c)}{2}\frac \delta 2 > 0

We can clearly see that the integral \int_a^b fg\, \mathrm{d}x has a positive real value grater than zero that’s (considering that f has at least on point c such that f(c) >0) in contraposition with the initial request that the integral should have been zero evaluated.

To create a direction function g that’s in the set C^\infty we can use the function h(for whose those condition is demonstrated) as in Figure 5 defined as:

h(t) = \begin{cases} \mathrm{e}^{-1/t} \qquad & t > 0 \\ 0 & t\leq 0 \end{cases} \tag{4}

Code

# funzione h
h <- function(t) {
  ifelse(t > 0, exp(-1/t), 0)
}

# grafico della funzione h
plot_h <- function() {
  # Creare un vettore di valori t
  t_values <- seq(-1, 3, by = 0.01)

  # valori di h per ogni t
  y_values <- h(t_values)

  # il grafico
  plot(t_values, y_values, type = "l", col = "blue",
       main = expression(h(t)),
       xlab = "t", ylab = expression(h(t)),
       ylim = c(0, 1), lwd = 2)

  # linee orizzontali e verticali
  abline(h = seq(0, 1, by = 0.2), col = "lightgray", lty = "dotted")
}

plot_h()

Figure 5: Representation of h(t) defined in Equation 4

Code

# funzione h(t)
h <- function(t) {
  ifelse(t > 0, exp(-1/t), 0)
}

# derivate
h_prime <- function(t) {
  ifelse(t > 0, (exp(-1/t) / t^2), 0)
}

h_double_prime <- function(t) {
  ifelse(t > 0, (2 * exp(-1/t) / t^3 - exp(-1/t) / t^4), 0)
}

h_triple_prime <- function(t) {
  ifelse(t > 0, (6 * exp(-1/t) / t^4 - 3 * exp(-1/t) / t^5), 0)
}

# grafico con 4 riquadri
par(mfrow = c(2, 2), mar = c(3, 3, 2, 1))

# vettore di valori t
t_values <- seq(-1, 5, by = 0.01)

# valori di h(t) e delle derivate
h_values <- h(t_values)
h_prime_values <- h_prime(t_values)
h_double_prime_values <- h_double_prime(t_values)
h_triple_prime_values <- h_triple_prime(t_values)

# Plot di h(t)
plot(t_values, h_values, type = "l", col = "blue",
     main = TeX("$h(t)$"),
     ylim = c(0, 1), lwd = 2, axes = FALSE)
axis(2) # Aggiungere solo l'asse y
box() # Aggiungere il box

# Plot della derivata prima
plot(t_values, h_prime_values, type = "l", col = "red",
     main = TeX("$h^{(1)}(t)$"),
     ylim = c(-0.5, 1), lwd = 2, axes = FALSE)
axis(2)
box()

# Plot della derivata seconda
plot(t_values, h_double_prime_values, type = "l", col = "green",
     main = TeX("$h^{(2)}(t)$"),
     ylim = c(-2.5, 1), lwd = 2, axes = FALSE)
axis(2)
box()

# Plot della derivata terza
plot(t_values, h_triple_prime_values, type = "l", col = "purple",
     main = TeX("$h^{(3)}(t)$"),
     ylim = c(-40, 2), lwd = 2, axes = FALSE)
axis(2)
box()

Figure 6: Function h(t) defined in Equation 4 with its first three derivatives.

A way to create a function that presents a bell shape in the range [0,1] is by computing

g(t) := h(t) h(1-t)

how’s graph in the range [0,1] is similar to the one shown in figure Figure 7.

Code

# funzioni h(t) e g(t)
h <- function(t) { ifelse(t > 0, exp(-1/t), 0) }
g <- function(t) { h(t) * h(1 - t) }

# vettore di valori t
t_values <- seq(-1, 2, by = 0.01)

# valori di g(t)
g_values <- g(t_values)

# grafico di g(t)
plot(t_values, g_values, type = "l", col = "blue",
     main = TeX("$g(t) = h(t) \\cdot h(1 - t)$"),
     xlab = "t", ylab = TeX("$g(t)$"),
     ylim = c(0, max(g_values)), lwd = 2)

grid()

Figure 7: Eepresentation of the function g(t):=h(t)h(1-t), where h(t) is defined in equation Equation 4

In particular, to demonstrate the fundamental lemma of the calculus of variations, we need to rescale the function g so that it has a bell shape centered at the point c with a bell width \delta. We consider the following formulation:

g(t) = K \, h\left(\frac{t - c + \delta}{2\delta}\right) h\left(\frac{\delta + c - t}{2\delta}\right)

where the constant K \in \mathbb{R} is chosen such that g(c) = 1 or \int_a^b g(x) \, \mathrm{d}x = 1.

In this case, we observe that:

g \in C^\infty(\Bbb{R}),
g(x) \geq 0 for all values of x, and specifically,
g(x) = 0 for all x \notin [c - \delta, c + \delta].

Lemma 1 can now be proven by contradiction.

Proof. As in the previous case, let us consider a function f(x) that is not identically zero. For this argument, we assume there exists at least one point c where f(c) > 0. By the sign permanence theorem, we can assert that:

f(x) \geq \frac{f(c)}{2} \qquad \forall x \in [c - \delta, c + \delta].

This inequality indicates that the function f(x) remains bounded away from zero within the interval [c - \delta, c + \delta].

Next, we examine the original integral \int_a^b fg\, \mathrm{d}x. Given our previous findings, we can conclude that this integral is not equal to zero. Specifically, we have:

\int_a^b f(x)g(x)\, \mathrm{d}x \geq \frac{f(c)}{2}\int_a^b g(x)\, \mathrm{d}x > 0.

Here, the term \frac{f(c)}{2} is positive, and since g(x) is non-negative for all x, the product \int_a^b f(x)g(x)\, \mathrm{d}x yields a positive value. This result constradict the assumption that f(x) in not identically zero. \square

Euler Lagrange equation

Pendulum example

The Euler Lagrange equation is the generalization of the minimum action principle that’s used in physics. Considering as a practical example the motion of a pendulum of a mass m that’s free to oscillate in respect to a pivot point using a rope of length l, the kinetic energy T and the potential term V of the system can be expressed as

T = \frac 1 2 m v^2, \qquad V = mgy.

Considering \theta as the angle that the rope determines with the vertical axis, then the we can rewrite the energies as functions of the angular position \theta and velocity \dot \theta as

T(\theta,\dot \theta) = \frac 1 2 m l^2 \dot\theta^2, \qquad V(\theta) = - m gl\sin\theta.

To solve the dynamic equation \theta(t) of the mechanism we can compute the lagrangian L of the system defined as

L(\theta,\dot\theta) = T(\theta,\dot\theta)-V(\theta) = \frac m 2 l^2\dot\theta^2 + lmg\cos\theta

As law that’s analyzed in mechanics physics we can state that the solution of the dynamics of the system is the one the function that minimize the action \mathcal{A} of the system defined as

A(\theta) = \int_{t_0}^{t_1} L (\theta,\dot\theta)\, \mathrm{d}t \tag{5}

where the values \theta(t_0)= \theta_0 and \theta(t_1)=\theta_1 are known parameters.

In practice to determine the required solution we can use analytical tool to find the trajectory that can then be demonstrated to be the minimum of the action \mathcal{A} (that’s indeed a functional).

However we can also try to analytically determine the function \theta^\star(t) that minimize the functional \mathcal{A}(\cdot) by computing the directional derivatives.

Let now consider the function \theta^\star(t) and a direction \delta\theta(t) that satisfy

\delta\theta(t_0)=\delta\theta(t_1) = 0

we can then use the fundamental lemma of the calculus of variation to determine the minimal function (considering that the expression Equation 5 of the action is comparable to equation Equation 3 of the lemma).

To determine the minimum point we have in fact to determine the function \theta^\star whose first variation of the action \mathcal{A} is zero for each direction of approach \delta\theta to the point, and so in this example we need to compute

\begin{aligned} \frac{\mathrm{d}}{\mathrm{d}\alpha} \mathcal{A} \big(\theta^\star + \alpha\, \delta\theta\big) & = \frac{\mathrm{d}}{\mathrm{d}\alpha} \int_{t_0}^{t_1} L\big( \theta ^\star + \alpha\, \delta\theta, \dot\theta^\star + \alpha\, \delta\dot\theta \big) \,\mathrm{d}t \\ & = \int_{t_0}^{t_1} \left[\frac{\partial L}{\partial\theta} \frac{\mathrm{d}}{\mathrm{d}\alpha} \big(\theta^\star + \alpha \,\delta\theta\big) + \frac{\partial L}{\partial\dot\theta} \frac{\mathrm{d}}{\mathrm{d}\alpha} \big(\dot \theta^\star + \alpha \,\dot \delta\theta\big) \right] \, \mathrm{d}t \\ & = \int_{t_1}^{t_2} \big(-lmg\sin(\theta^\star + \alpha\, \delta\theta)\big)\delta\theta + ml^2\big( \dot\theta^\star + \alpha \, \delta\dot\theta \big)\, \delta\dot\theta \, \mathrm{d}t \end{aligned}

Note that from in the first step we made the implicit assumption that

\frac{\mathrm{d}}{\mathrm{d}\alpha} \int = \int\frac{\mathrm{d}}{\mathrm{d}\alpha}

while however this is not always possible. Evaluating the previous expression for \alpha = 0 gives the first variation of the action \mathcal{A} that’s

\begin{aligned} \delta \mathcal{A}(\theta^\star;\delta\theta) & = \frac{\mathrm{d}}{\mathrm{d}\alpha} \mathcal{A}\big(\theta^\star + \alpha\, \delta\theta\big) \Big|_{\alpha = 0}\\ & = \int_{t_0}^{t_1} \big(-lmg \sin\theta^\star\delta\theta + ml^2\dot\theta^\star \delta\dot\theta \big) \, \mathrm{d}t = 0 \qquad \end{aligned}

for all \delta\theta. Performing an integration by part allows to remove the term \delta\dot\theta that’s hard to determine, and in fact

\begin{aligned} \frac{\mathrm{d}}{\mathrm{d}t}\big( ml^2\dot\theta^\star \, \delta\theta\big) & = \frac{\mathrm{d}}{\mathrm{d}t}\big(ml^2\dot\theta^\star\big)\,\delta\theta + ml^2\dot\theta^\star \delta\dot\theta \\ & \Downarrow \\ ml^2\dot\theta^\star \delta\dot\theta &= \frac{\mathrm{d}}{\mathrm{d}t}\big( ml^2\dot\theta^\star \, \delta\theta\big) - \frac{\mathrm{d}}{\mathrm{d}t}\big(ml^2\dot\theta^\star\big)\,\delta\theta \end{aligned}

Performing the substitution of the integration by parts determines the result

\begin{aligned} \mathcal{A} \big(\theta^\star;\delta\theta\big) & = \int_{t_0}^{t_1} \left[ - lmg\sin\theta^\star - \frac{\mathrm{d}}{\mathrm{d}t}\big(ml^2\theta^\star\big)^2 \right]\delta\theta\, \mathrm{d}t + \cancel{\big[ ml^2 \dot\theta^\star \delta\theta \big]\Big|_{t_0}^{t_1} } \\ & = \int_{t_0}^{t_1} \underbrace{\left[ -lmg \sin\theta^\star - \frac{\mathrm{d}}{\mathrm{d}t}\big(ml^2 \dot\theta^\star\big) \right]}_{f(t)} \delta\theta\, \mathrm{d}t = 0 \end{aligned}

We can now see in this formulation that the marked expression represent the function f(x) of the fundamental lemma considering that the relation must be true for all approaching direction \delta\theta, and so it must be

-lmg \sin\theta^\star - \frac{\mathrm{d}}{\mathrm{d}t}\big(ml^2 \dot\theta^\star\big) = 0

The problem now to complete the analyses of the pendulum motion is determining the function \theta^\star(t) that satisfy this expression and also match the boundary conditions \theta(t_0) = \theta_0 and \theta(t_1)=\theta_1.

General formulation

Given a generic functional of the form

\mathcal{A}(x) = \int_a^b L \big(x(t),x'(t),t\big)\, \mathrm{d}t

the problem is to minimize the functional \mathcal{A}(x) for a function x \in\Bbb{V} subject to the boundary conditions x(a) = x_a and x(b) = x_b in the function space \Bbb{V} defined as

\Bbb{V} = \big\{ x \ | \ x\in C^2([a,b]) \textrm{ with } x(a) = x_a,x(b) = x_b \big\}

Let \Bbb{D} the function space of all the feasible directions of derivatives defined as

\Bbb{D} = \big\{ \delta x \ | \ \delta x \in C^{\infty}([a,b]) \textrm{ with } \delta x(a) = \delta x(b) = 0 \big\}

the function x(t) that minimize the functional is the one whose first variation is zero for any admissible perturbation, and so such that

\delta \mathcal{A}(x;\delta x) = \frac{\mathrm{d}}{\mathrm{d}\alpha} \mathcal{A}(x+\alpha\, \delta x) \Big|_{\alpha = 0} = 0 \qquad \forall \delta x \in\Bbb{D}

Assuming the possibility to correctly apply the rule

\frac{\mathrm{d}}{\mathrm{d}\alpha} \int = \int \frac{\mathrm{d}}{\mathrm{d}\alpha}

we can express the directional derivative of the functional \mathcal{A}(x) evaluated in x+ \alpha\, \delta x as

\begin{aligned} \frac{\mathrm{d}\mathcal{A}(x+\alpha\delta x)}{\mathrm{d}\alpha} & = \frac{\mathrm{d}}{\mathrm{d}\alpha} \int_a^b L \big( x + \alpha\, \delta x ,x' + \alpha\, \delta x', t\big)\, \mathrm{d}t \\ & = \int_a^b \left( \frac{\partial L}{\partial x}\delta x + \frac{\partial L}{\partial x'}\delta x'\right)\, \mathrm{d}t \\ \frac{\mathrm{d}\mathcal{A}}{\mathrm{d}\alpha}\Big|_{\alpha = 0} & = \int_a^b \left( \frac{\partial L}{\partial x}(x,x',t)\delta x + \frac{\partial L}{\partial x'}(x,x',t)\delta x' \right)\, \mathrm{d}t \end{aligned}

By performing the integration by part it’s possible do reconvert the term associated do \delta x' into pieces depending on \delta x one of which, when evaluated, becomes zero due to the fact that \delta x(a) = \delta x(b) = 0:

\begin{aligned} \frac{\mathrm{d}\mathcal{A}}{\mathrm{d}\alpha} \Big|_{\alpha = 0} & = \int_a^b \left[ \frac{\partial L}{\partial x}(x,x',t) - \frac{\mathrm{d}}{\mathrm{d}t} \left(\frac{\partial L}{\partial x'}(x,x',t) \right) \right] \delta x\, \mathrm{d}t \\ & + \int_a^b\frac{\mathrm{d}}{\mathrm{d}t} \left(\frac{\partial L}{\partial x'}(x,x',t)\delta x \right) \, \mathrm{d}t \\ & = \int_a^b \left[ \frac{\partial L}{\partial x}(x,x',t) - \frac{\mathrm{d}}{\mathrm{d}t} \left(\frac{\partial L}{\partial x'}(x,x',t) \right) \right] \delta x\, \mathrm{d}t \\ &+ \cancel{\left[ \frac{\partial L}{\partial x'}(x,x',t)\delta x \right]_{t=a}^{t=b}} \end{aligned}

Note

In this case the substitution by part has been obtained by performing the operation

\frac{\mathrm{d}}{\mathrm{d}t} \left( \frac{\partial L}{\partial x'}(x,x',t)\delta x \right) = \frac{\mathrm{d}}{\mathrm{d}t} \left( \frac{\partial L}{\partial x'}(x,x',t) \right) \delta x + \frac{\partial L}{\partial x'}(x,x',t)\delta x'

If we in fact explicit the term containing \delta x' we have

\frac{\partial L}{\partial x'}(x,x',t)\delta x' = \underbrace{\frac{\mathrm{d}}{\mathrm{d}t} \left(\frac{\partial L}{\partial x'}(x,x',t)\delta x \right)}_\textrm{term \#1} - \underbrace{\frac{\mathrm{d}}{\mathrm{d}t} \left(\frac{\partial L}{\partial x'}(x,x',t)\right) \delta x}_\textrm{term \#2}

The second term still remains in the variation of the functional \mathcal{A}, while the first term is eliminated because it’s evaluation becomes

\begin{aligned} \int_a^b\frac{\mathrm{d}}{\mathrm{d}t} \left(\frac{\partial L}{\partial x'}\delta x \right) \,\mathrm{d}t &= \left[ \frac{\partial L}{\partial x'}(x(t),x'(t),t) \delta x(t) \right]_a^b \\ &= \frac{\partial L}{\partial x'}(x(b),x'(b),b) \delta x(b) - \frac{\partial L}{\partial x'}(x(a),x'(a),a) \delta x(t) \end{aligned}

Looking ad the function space \Bbb{D} we see that \delta x(a) = \delta x(b) = 0 and so the the evaluation of the integral becomes null.

We can now see that the condition to have a minimum for the functional \mathcal{A}, by using the fundamental lemma of calculus of variation, is requiring that the function f(t) defined as follows is identically null:

\frac{\mathrm{d}\mathcal{A}(x+\alpha\, \delta x)}{\mathrm{d}\alpha} \Big|_{\alpha = 0} = \int_a^b \underbrace{\left( \frac{\partial L}{\partial x} - \frac{\mathrm{d}}{\mathrm{d}t} \frac{\partial L}{\partial x'}\right)}_{=f(t)} \delta x \, \mathrm{d}t = 0 \tag{6}

This in general means that the function that minimise the functional \mathcal A must solve the following second order ordinary differential equation:

\begin{cases} \dfrac{\mathrm{d}}{\mathrm{d}t} \dfrac{\partial L }{\partial x'} - \dfrac {\partial L }{\partial x} = 0 \\[0.5em] x(a) = x_a \\ x(b) = x_b \end{cases}

Example 5 (computation of the first variation) Given the generic functional defined as

\mathcal{F}(x) = \int_a^b G(x,x',t)\, \mathrm{d}t + x(a)x(b)

in order to compute it’s first variation we can use the standard approach by evaluating the perturbation of the functional respect to a function x:

\begin{aligned} \frac{\mathrm{d}}{\mathrm{d}\alpha} \mathcal{F} \big(x+\alpha\, \delta x\big) &= \frac{\mathrm{d}}{\mathrm{d}\alpha} \int_a^b G\big(x+\alpha\, \delta x, x' + \alpha\, \delta x',t\big)\, \mathrm{d}t \\[1em] & +\frac{\mathrm{d}}{\mathrm{d}\alpha} \Big( \big(x(a) + \alpha \delta x(a)\big)\big(x(b) + \alpha\, \delta x(b)\big)\Big) \\[1em] &= \int_a^b \left( \frac{\partial G}{\partial x} x \, \delta x + \frac{\partial G}{\partial x'} \delta x' \right) \, \mathrm{d}t \\[1em] & + \delta x(a) \big(x(b) + \alpha\, \delta x(b)\big) + \big( x(a) + \alpha\,\delta x(a) \big)\delta x(b) \\[1em] &\downarrow\quad\alpha=0 \\[1em] &=\int_a^b \left( \frac{\partial G}{\partial x} \delta x + \frac{\partial G}{\partial x'}\delta x' \right) \, \mathrm{d}t + \delta x(a)x(b) + x(a)\delta x(b) \end{aligned}

This redundant formulation can be simplified by using the Gateaux derivative notation \delta that allows to express the first variation as

\begin{aligned} \delta \mathcal{F}(x) & = \delta \left(\int_a^b G(x,x',t)\, \mathrm{d}t + x(a)x(b)\right) \\[1em] &= \int_a^b \delta G(x,x',t)\, \mathrm{d}t + \delta\big(x(a)x(b)\big)\\[1em] &= \int_a^b \left( \frac{\partial G}{\partial x}(x,x',t)\delta x + \frac{\partial G}{\partial x'}(x,x',t)\delta x' \right)\, \mathrm{d}t \\[1em] & + \delta x(a) x(b) + x(a)\delta x(b) \end{aligned}

using integration by part

\dfrac{\mathrm{d}}{\mathrm{d}t} \left(\frac{\partial G}{\partial x'}(x,x',t)\delta x\right) = \dfrac{\mathrm{d}}{\mathrm{d}t}\frac{\partial G}{\partial x'}(x,x',t)\delta x + \frac{\partial G}{\partial x'}(x,x',t)\delta x'

and substituting

\begin{aligned} \delta \mathcal{F}(x) &= \int_a^b \underbrace{\left(\frac{\partial G}{\partial x}(x,x',t) -\dfrac{\mathrm{d}}{\mathrm{d}t}\frac{\partial G}{\partial x'}(x,x',t)\right) }_{(A)} \delta x\, \mathrm{d}t \\[1em] & + \underbrace{\left[ x(a)+\frac{\partial G}{\partial x'}(x(b),x'(b),b)\right] }_{(B)}\delta x(b) \\ & + \underbrace{\left[ x(b)-\frac{\partial G}{\partial x'}(x(a),x'(a),a)\right] }_{(C)}\delta x(a) \end{aligned}

using foundamental lemma of calculus of variations (A)=0 and then using variation differents from zero on th eborder it follows that (B)=(C)=0 and the stationary points of the functional F(x) satrisfy the BVP

\left\{\begin{aligned} \frac{\partial G}{\partial x}(x,x',t) -\dfrac{\mathrm{d}}{\mathrm{d}t}\frac{\partial G}{\partial x'}(x,x',t) &=0 \\ x(a)+\frac{\partial G}{\partial x'}(x(b),x'(b),b) &=0 \\ x(b)-\frac{\partial G}{\partial x'}(x(a),x'(a),a) &=0 \end{aligned}\right.

The brachistochrone problem

As saw at the first part of this notes, functional minimization can be used to solve real problem. As example the brachistochrone problem wants to determine the path that minimize the time T to reach point B from a point A for a mass m subjected only to the force of gravity: this means minimizing the function

T = \int_A^B \frac 1 v \, ds

where v is the velocity of the point over the arc length ds. In order to describe the problem we consider x as the horizontal axis and y the vertical one (with positive direction the one facing upwards); the point A is described by the coordinates (x_a,x_b) = (0,0) (so it’s in the center of the reference system) while the point B = (x_b,y_b) is placed anywhere in the lower semi-plane (in fact we must require that y_b<0 in order to have a free fall of the object).

Considering that at the initial position A both the kinetic and potential energy are zero, due to the conservation of energy we can derive that

0 = mgy + \frac 1 2 m v^2 \quad \implies \quad v = \sqrt{-2gy}

We so have the velocity v as depending from the vertical coordinate y of the point, and so the next step is to relate the arc length ds with the horizontal variation \mathrm{d}x: we can see that

\mathrm{d}s = \sqrt{\mathrm{d}x^2 + \mathrm{d}y^2} = \sqrt{\mathrm{d}x^2 + y'^2(x)\mathrm{d}x^2} = \sqrt{1 + y'^2} \mathrm{d}x

The initial brachistochrone problem can so be states as the following functional minimization problem:

\begin{aligned} \textrm{minimize:} \qquad & \mathcal{F}(y) = \int_{x_a}^{x_b} L(y,y')\, \mathrm{d}x \\ \textrm{subject to:} \qquad & y\big(x_a\big) = y_a = 0 \qquad y\big(x_b\big) = y_b < 0 \end{aligned}

where the independent variable is the x coordinate, and

L(y,y') = \sqrt{\frac{1+y'^2}{-2gy}}

the integral part of the target function to be minimized can be computed starting from

\begin{aligned} \frac{\partial L}{\partial y} - \frac{\mathrm{d}}{\mathrm{d}x} \frac{\partial L}{\partial y'} &= -\frac{1+3 y'^2 + 2y'^4 - 2 y \, y''} {2\sqrt 2 g^2 y^3 \sqrt{\left(- \dfrac{1 + y'^2}{gy}\right)^3}} \\ &= \frac{1+3 y'^2 + 2y'^4 - 2 y \, y''} {2\sqrt{2} g y^2 (1 + (y')^2)} \sqrt{-\dfrac{gy}{1 + y'^2}} \end{aligned}

and the functional minimization is equivalent to the following boundary value problem:

\left\{ \begin{aligned} & 1+3\,y'(x)^2 + 2\,y'(x)^4 - 2\,y(x)y''(x) = 0 \\ &y(0) = 0 \\ &y(x_b) = y_b \end{aligned} \right.

Even if it’s not clear at first sight, it can be proven that the solution of the differential equation is a cycloid described by the equations

x(\theta) = \frac{1}{2} k^2 \big(\theta-\sin\theta\big) \qquad y(\theta) = \frac{1}{2} k^2 \big(1-\cos\theta \big)

where the constant k and the range of the new angle \theta of the parametrization can be found by using the boundary conditions; in particular the initial condition on points A allows to state that \theta_a = 0 (in fact x(0) = \frac{1}{2} k^ 2 0 = 0 and also y(0) = 0), while considering the final conditions we can determine both \theta_b and k solving this system of non-linear equation in such variables:

\begin{cases} x_b = \dfrac{1}{2} k^2 \big(\theta_b - \sin\theta_b\big) \\[0.5em] y_b = \dfrac{1}{2} k^2 \big(1-\cos\theta_b\big) \end{cases}

References

Betts, John T. 2010. Practical Methods for Optimal Control Using Nonlinear Programming. 3rd ed. Society for Industrial; Applied Mathematics.

Bryson, Arthur E., and Yu-Chi Ho. 1975. Applied Optimal Control: Optimization, Estimation, and Control. Wiley.

Kirk, Donald E. 2004. Optimal Control Theory: An Introduction. Dover Publications.

Liberzon, Daniel. 2012. Calculus of Variations and Optimal Control Theory: A Concise Introduction. Princeton University Press.