Optimization of Functionals and the Calculus of Variations

There are many settings in Economics, particularly in mechanism design, where one wants to fund a function that maximizes (or minimizes) some objective. It turns out this works similarly to regular calculus.

Motivation

Consider the expected revenue of a mechanism designed to sell a good to one agent who’s value of the good is distributed uniformly with support [0,1][ 0, 1 ]. This expected revenue is known to be

R[p]=01p(v)(2v1)dvR[p] = \int_0^1 p(v) (2 v - 1)\, dv

where p(v)p(v) is a function that determines the probability of selling the good to the agent when her value is vv and 2v12 v - 1 is the virtual surplus. The above equation, R[p]R[p], is a functional of pp. That is, it maps the function pp to a real number. Suppose that you wanted to find the revenue maximizing mechanism pp. Then, the problem you want to solve is

maxp01p(v)(2v1)dv\max_p \int_0^1 p(v) (2 v - 1)\, dv

subject to 0p(v)10 \leq p(v) \leq 1. If you look at this problem closely, you will note that the optimal function must be

p(v)={0 if v<0.51 if v>0.5p(v) = \begin{cases} 0 &\text{ if } v < 0.5 \\ 1 &\text{ if } v > 0.5 \end{cases}

where the value at one half does not matter. This is because pp is multiplied by some “slope”. When the slope is positive, we want pp to be as large as possible. When the slope is negative, we want pp to be as small as possible. Indeed, the functional derivative of R[p]R[p] is

δRδp(v)=2v1\frac{\delta R}{\delta p}(v) = 2v - 1

We will see that the functional derivative formalizes our intuition about this slope. It will allow us to solve optimization problems that are nonlinear.

On the reals

Optimization over function spaces is a natural extension of optimization over the reals. So, it’s important to remember exactly how optimization works on functions of real vectors. Consider the following function g:RNRg : \mathbb{R}^N \to \mathbb{R}.

g(X)=(x1)2+(x2)2g(X) = (x_1)^2 + (x_2)^2

where X=(x1,x2)X = (x_1, x_2). The directional derivative in the direction V=(v1,v2)V = (v_1, v_2) is

dgV(X)=limϵ0[g(X+ϵV)g(X)ϵ]=g(X)V=[2x12x2]V\begin{align*} dg_V(X) &= \lim_{\epsilon \to 0} \left[ \frac{g(X + \epsilon V) - g(X)}{\epsilon} \right] \\ &= \nabla g(X) \cdot V \\ &= \begin{bmatrix} 2 x_1 \\ 2 x_2 \end{bmatrix} \cdot V \\ \end{align*}

If XX^\star is a minimum, there cannot be a direction, VV, such that dgV(X)<0dg_V(X^\star) < 0. This would imply that there exists an ϵ>0\epsilon > 0 such that g(X+ϵV)<g(X)g(X^\star + \epsilon V) < g(X^\star). This contradicts the assumption that XX^\star is a minimum. In addition, there cannot be a VV such that dgV(X)>0dg_V(X^\star) > 0 because his would imply dg(V)(X)=dgV(X)<0dg_{(-V)}(X^\star) = - dg_{V}(X^\star) < 0.

Therefore, we know that for any XX^\star, dgV(X)=0dg_V(X^\star) = 0 for all VR2V \in \mathbb{R}^2. This is only possible if g(X)=(0,0)\nabla g(X^\star) = (0, 0) which means g(X)x1=2x1=0\frac{\partial g(X^\star)}{\partial x_1} = 2 x_1 = 0 and g(X)x2=2x2=0\frac{\partial g(X^\star)}{\partial x_2} = 2 x_2 = 0. So, the only local stationary point is at (0,0)(0,0). However, stationarity is a necessary but not sufficient condition. We would need to test for convexity to ensure that this is a minimum. Though, it obviously is.

Functional optimization follows the same principles.

On continuous functions

Suppose there is a known continuous function ϕ\phi that we want to approximate with another continuous function using least squares on the interval [0,1][0, 1]. That is, you want to minimize the following functional

L[f]=01(f(x)ϕ(x))2dx.L[f] = \int_0^1 (f(x) - \phi(x))^2\, dx.

Of course, the minimum is going to be at f=ϕf^* = \phi. We will solve this by finding the Gateaux differential of LL with respect to some direction, ψ\psi. This is defined in the same way as the directional derivative on the reals

dLψ[f]=limϵ0[L[f+ϵψ]L[f]ϵ].dL_{\psi}[f] = \lim_{\epsilon \to 0} \left[ \frac{L[f + \epsilon \psi] - L[f]}{\epsilon} \right].

So, the Gateaux differential of LL is

dLψ[f]=limϵ0[01(f(x)+ϵψ(x)ϕ(x))2(f(x)ϕ(x))2ϵdx]=limϵ0[012f(x)2ϕ(x)+ϵψ(x)ϵϵψ(x)dx]=012(f(x)ϕ(x))ψ(x)dx\begin{align*} dL_{\psi}[f] &= \lim_{\epsilon \to 0} \left[ \int_0^1 \frac{ (f(x) + \epsilon \psi(x) - \phi(x))^2 - (f(x) - \phi(x))^2}{\epsilon} \, dx \right] \\ &= \lim_{\epsilon \to 0} \left[ \int_0^1 \frac{ 2f(x) - 2\phi(x) + \epsilon\psi(x)}{\epsilon} \epsilon\psi(x) \, dx \right] \\ &= \int_0^1 2 (f(x) - \phi(x)) \psi(x) \, dx\\ \end{align*}

which is the inner product of the functional derivative,

δLδf(x)=2(f(x)ϕ(x)),\frac{\delta L}{\delta f}(x) = 2 (f(x) - \phi(x)),

and the direction, ψ\psi. Like in calculus on the reals, we need this to be zero for all directions. It turns out that, just like before, we only need to set the directional derivative to zero. This is due to the following lemma.

Fundamental lemma of calculus of variations If gg is continuous on [a,b][a,b] and satisfies the equality

abg(x)ψ(x)dx=0\int_a^b g(x) \psi(x) \, dx = 0

for all continuous functions ψ\psi such that ψ(a)=ψ(b)=0\psi(a)=\psi(b)=0, then f(x)=0f(x) = 0 for all x[a,b]x \in [a,b].

This means that dLψ[f]=0dL_{\psi}[f]=0 if and only if 2(f(x)ϕ(x))=02 (f(x) - \phi(x)) = 0 for all x[0,1]x \in [0,1]. So, f=ϕf = \phi as anticipated.


github mail linkedin gscholar XMPP pdf slides link arrow down