Convex analysis (part II)

The convex conjugate

Let us start with the definition of the subdifferential of a convex function ff at a point xCRnx\in C \subseteq \mathbb R^n:

f(x)={yRn f(z) f(x)+zx,y, zC},\begin{array}{rcl} \partial f(x) &=& \{y\in \mathbb R^n\,|\, f(z)\,\ge\, f(x)+\langle z-x,y\rangle, \,\forall z\in C\},\end{array} We can rearrange terms in the condition as follows: f(x)={yRn z,yf(z) x,yf(x), zC}.\begin{array}{rcl} \partial f(x) &=& \{y\in\mathbb R^n\,|\, \langle z,y\rangle - f(z)\,\le\, \langle x,y\rangle - f(x), \,\forall z\in C\}.\end{array} However, since the condition must hold for all zz, it must equivalently hold for any zz that maximises the lower bound. Note that the maximum of that lower bound tightens the inequality exactly (take z=xz=x). We can thus write the subdifferential as

f(x) ⁣ ⁣= ⁣ ⁣{yRn maxzC [z,yf(z)] = x,yf(x)}. \partial f(x) \quad\!\! =\quad\!\! \{y\in \mathbb R^n \,|\, \max_{z\in C} \,\, [\langle z,y\rangle -f(z)] \,=\, \langle x,y\rangle -f(x)\}.

We can now introduce the convex conjugate of a function (also sometimes known as the Fenchel-Legendre convex conjugate or a combination of those words).
In the sequel, we write R\overline{\mathbb R} the extended real line: R=R{±}\overline{\mathbb R} = \mathbb R \cup \{\pm\infty\}.

Let f:CRf: C\to \overline{\mathbb R}, a function, the convex conjugate of ff is the function f(y):RnRf^\star(y):\mathbb R^n\to \overline{\mathbb R} with:

f(y) ⁣ ⁣:= ⁣ ⁣supzC [z,yf(z)]. f^\star(y) \quad\!\! :=\quad\!\! \sup_{z\in C} \,\,[\langle z,y \rangle - f(z)].

Note: it's easy to show that the convex conjugate of a function ff is always convex (even if the function ff is not convex).

We can now use the convex conjugate in (3): f(x)={yRn f(y) = x,yf(x)},\begin{array}{rcl} \partial f(x) &=& \{ y\in\mathbb R^n \,|\, f^{\star}(y) \,=\, \langle x,y\rangle - f(x)\}, \end{array} which is a convenient characterisation that will often be used.

Example: we can consider a simple (and yet quite useful) example for the convex conjugate: if we define ψ(x):=x2/2\psi(x):=\|x\|^2/2, its convex conjugate is then

ψ(y) ⁣ ⁣= ⁣ ⁣supz z,y12z,z. \psi^\star(y) \quad\!\! =\quad\!\! \sup_{z}\,\, \langle z,y\rangle - \frac12\langle z,z\rangle.

The problem in the right hand side is easy to solve: the objective function is differentiable and the FOC gives yz=0y-z^\sharp = 0 so that ψ(y)=ψ(y)\psi^\star(y)= \psi(y).

Another nice example is f(x)=xlog(x)f(x)=x\log(x), it's an easy exercise to show that f(y)=exp(y1)f^\star(y)=\exp(y-1) and f(x)=f(x)f^{\star\star}(x) = f(x).

Fenchel's inequality and the Fenchel-Moreau theorem

(FMT) Let f:CRf:C\to \overline{\mathbb R} and ff^\star its convex conjugate, then: f(x)+f(y)x,y,xC,yRn.\begin{array}{rcl} f(x) + f^\star(y) &\ge & \langle x,y\rangle, \quad \forall x\in C,y\in \mathbb R^n.\end{array} This is known as Fenchel's inequality.

This inequality is directly implied by the definition of the convex conjugate.

Let fΓ0(C)f\in \Gamma_0(C) then fff^{\star\star}\equiv f on CC. This is known as the Fenchel-Moreau theorem (FMT).

Proving the theorem in full generality requires a bit of care (see references). It is however relatively straightforward to show that the result holds on CC^\circ (the interior of CC) without requiring extra results. We will go about it in two parts:

  1. show that f(z)f(z)f(z)\ge f^{\star\star}(z) for all zCz\in C,

  2. show that f(z)f(z)f(z)\le f^{\star\star}(z) for all zCz\in C^\circ.

FMT (pt.1)

By definition of the convex conjugate,

f(z) ⁣ ⁣= ⁣ ⁣supyRnz,yf(y) f^{\star\star}(z) \quad\!\! =\quad\!\! \sup_{y\in\mathbb R^n} \left\langle z,y\right\rangle-f^\star(y)

but since f(y)=infxC[f(x)x,y]-f^\star(y)=\inf_{x\in C} [f(x)-\left\langle x,y\right\rangle], we have that for any yy and zz, z,yf(y)zx,y+f(x),xC.\begin{array}{rcl} \left\langle z,y\right\rangle - f^\star(y) &\le& \left\langle z-x,y\right\rangle + f(x), \quad\forall x\in C.\end{array} In particular, for any zCz\in C, we can take x=zx=z so that f(z)f(z)f^{\star\star}(z) \le f(z) on CC.

FMT (pt.2)

Starting again from (8), we can write f(z)z,yf(y),yRn.\begin{array}{rcl} f^{\star\star}(z) &\ge& \left\langle z,y\right\rangle - f^\star(y), \quad \forall y \in\mathbb R^n.\end{array} If we now consider xCx\in C^\circ, we know that f(x)\partial f(x) is non-empty and so can pick yf(x)y\in \partial f(x). But for such a yy using the characterisation of the subdifferential (5) we can write f(y)=x,yf(x)\begin{array}{rcl} f^\star(y) &=& \left\langle x, y\right\rangle - f(x)\end{array} so that f(z)zx,y+f(x)f^{\star\star}(z) \ge \left\langle z-x, y\right\rangle + f(x). For zCz\in C^\circ we can pick x=zx=z and get f(z)f(z)f^{\star\star}(z)\ge f(z).

Subdifferential of the convex conjugate

Consider the definition of the subdifferential (5), for any xCx\in C with y(x)y\in \partial(x) we have (xC,yf(x))f(y)=x,yf(x).\begin{array}{rcl} (x\in C, y\in\partial f(x)) &\Longleftrightarrow& f^\star(y)=\left\langle x, y\right\rangle-f(x).\end{array} With exactly the same definition, we have for xf(y)x\in\partial f^\star(y) xf(y)f(x)=x,yf(y),\begin{array}{rcl} x\in\partial f^\star(y) &\Longleftrightarrow& f^{\star\star}(x)=\left\langle x, y\right\rangle-f^\star(y),\end{array} and the last equality can be rearranged to f(y)=x,yf(x)f^\star(y)=\left\langle x,y\right\rangle-f^{\star\star}(x). In the case where fΓ0(C)f\in\Gamma_0(C) then fff\equiv f^{\star\star} on CC by the Fenchel-Moreau theorem which gives the following result.

(dual pair) Let fΓ0(C)f\in\Gamma_0(C) then the following equivalence holds (xC,yf(x))xf(y)C.\begin{array}{rcl} (x\in C, y \in \partial f(x)) &\Longleftrightarrow& x \in \partial f^\star(y) \cap C. \end{array} Such a pair of points is called a dual pair and verifies f(x)+f(y)=x,yf(x)+f^\star(y)=\left\langle x,y\right\rangle.

Using the inverse of the subdifferential operator on the left hand side of (14), we have that x(f)1(y)Cx\in(\partial f)^{-1}(y)\cap C is equivalent to xf(y)Cx\in\partial f^\star(y)\cap C which exposes a close link between (f)1(\partial f)^{-1} and f\partial f^\star. This link becomes even more apparent in the unconstrained case.

Let fΓ0(Rn)f\in\Gamma_0(\mathbb R^n), then (f)1f.\begin{array}{rcl} (\partial f)^{-1} &\equiv & \partial f^\star.\end{array}

Note: recall that, with the first order condition, a minimiser xx^\dagger is such that 0f(x)0\in \partial f(x^\dagger) or (f)1(0)x(\partial f)^{-1}(0) \ni x^\dagger. We can now also write this xf(0)x^\dagger \in \partial f^\star(0), thereby expressing the minimiser in terms of the convex-conjugate. Of course it's not clear at this point whether this helps at all to find the minimiser but, as it turns out, it will.

Example: let's consider f(x)=xlog(x)f(x)=x\log(x) on C=[0,1]C = [0, 1]. We know that f(y)=exp(y1)f^\star(y)=\exp(y-1). Both ff and ff^\star are differentiable so f(x)={f(x)}\partial f(x)=\{\nabla f(x)\} with f(x)=logx+1\nabla f(x)=\log x +1 on CC and f(y)={f(y)}\partial f^\star(y)=\{\nabla f^\star(y)\} with f(y)=exp(y1)\nabla f^\star(y)=\exp(y-1) on R\mathbb R.

Gathering the pieces, if we take xCx\in C and y=f(x)y=\nabla f(x), we have y = logx+1x = exp(y1) = f(y)\begin{array}{rcl} y \,=\,\log x+1 &\Longrightarrow& x \,=\,\exp(y-1) \,=\,\nabla f^\star(y)\end{array} and conversely if we consider a yy such that exp(y1)C\exp(y-1)\in C then x = exp(y1)y = logx+1 = f(x)\begin{array}{rcl} x \,=\,\exp(y-1) &\Longrightarrow& y \,=\,\log x + 1 \,=\,\nabla f(x)\end{array} So that the equivalence (14) holds as expected.

Additional references

Note: in this page (and also further) we usually consider CC^\circ instead of CC to make the presentation simpler since we know that on CC^\circ, subgradients exist and are bounded. Of course this is done at the expense of generality and some care is needed to generalise properly but the extra details did not seem particularly relevant at the level of this presentation. The references listed below show (and prove) the results in their full glory.

  1. Hiriart-Urruty, A note on the Legendre-Fenchel transform of convex composite functions, 2006. – This is a more technical note that you may find interesting if you would like more details on convex conjugacy.

  2. Mete Soner, Convex analysis and Fenchel-Moreau theorem, 2015. – Lecture notes covering convex conjugacy in more depth and proving rigorously the Fenchel-Moreau theorem.

See also the general references mentioned in the introduction.