optimisation Let us start with the definition of the subdifferential of a convex function f at a point x∈C⊆Rn:
∂f(x)={y∈Rn∣f(z)≥f(x)+⟨z−x,y⟩,∀z∈C}, We can rearrange terms in the condition as follows: ∂f(x)={y∈Rn∣⟨z,y⟩−f(z)≤⟨x,y⟩−f(x),∀z∈C}. However, since the condition must hold for all z, it must equivalently hold for any z that maximises the lower bound. Note that the maximum of that lower bound tightens the inequality exactly (take z=x). We can thus write the subdifferential as
∂f(x)={y∈Rn∣z∈Cmax[⟨z,y⟩−f(z)]=⟨x,y⟩−f(x)}. We can now introduce the convex conjugate of a function (also sometimes known as the Fenchel-Legendre convex conjugate or a combination of those words).
In the sequel, we write R the extended real line: R=R∪{±∞}.
Let f:C→R, a function, the convex conjugate of f is the function f⋆(y):Rn→R with:
f⋆(y):=z∈Csup[⟨z,y⟩−f(z)]. Note: it's easy to show that the convex conjugate of a function f is always convex (even if the function f is not convex).
We can now use the convex conjugate in (3): ∂f(x)={y∈Rn∣f⋆(y)=⟨x,y⟩−f(x)}, which is a convenient characterisation that will often be used.
Example: we can consider a simple (and yet quite useful) example for the convex conjugate: if we define ψ(x):=∥x∥2/2, its convex conjugate is then
ψ⋆(y)=zsup⟨z,y⟩−21⟨z,z⟩. The problem in the right hand side is easy to solve: the objective function is differentiable and the FOC gives y−z♯=0 so that ψ⋆(y)=ψ(y).
Another nice example is f(x)=xlog(x), it's an easy exercise to show that f⋆(y)=exp(y−1) and f⋆⋆(x)=f(x).
(
FMT) Let
f:C→R and
f⋆ its convex conjugate, then:
f(x)+f⋆(y)≥⟨x,y⟩,∀x∈C,y∈Rn. This is known as
Fenchel's inequality.
This inequality is directly implied by the definition of the convex conjugate.
Let
f∈Γ0(C) then
f⋆⋆≡f on
C. This is known as the
Fenchel-Moreau theorem (FMT).
Proving the theorem in full generality requires a bit of care (see references). It is however relatively straightforward to show that the result holds on C∘ (the interior of C) without requiring extra results. We will go about it in two parts:
show that f(z)≥f⋆⋆(z) for all z∈C,
show that f(z)≤f⋆⋆(z) for all z∈C∘.
By definition of the convex conjugate,
f⋆⋆(z)=y∈Rnsup⟨z,y⟩−f⋆(y) but since −f⋆(y)=infx∈C[f(x)−⟨x,y⟩], we have that for any y and z, ⟨z,y⟩−f⋆(y)≤⟨z−x,y⟩+f(x),∀x∈C. In particular, for any z∈C, we can take x=z so that f⋆⋆(z)≤f(z) on C.
Starting again from (8), we can write f⋆⋆(z)≥⟨z,y⟩−f⋆(y),∀y∈Rn. If we now consider x∈C∘, we know that ∂f(x) is non-empty and so can pick y∈∂f(x). But for such a y using the characterisation of the subdifferential (5) we can write f⋆(y)=⟨x,y⟩−f(x) so that f⋆⋆(z)≥⟨z−x,y⟩+f(x). For z∈C∘ we can pick x=z and get f⋆⋆(z)≥f(z).
Consider the definition of the subdifferential (5), for any x∈C with y∈∂(x) we have (x∈C,y∈∂f(x))⟺f⋆(y)=⟨x,y⟩−f(x). With exactly the same definition, we have for x∈∂f⋆(y) x∈∂f⋆(y)⟺f⋆⋆(x)=⟨x,y⟩−f⋆(y), and the last equality can be rearranged to f⋆(y)=⟨x,y⟩−f⋆⋆(x). In the case where f∈Γ0(C) then f≡f⋆⋆ on C by the Fenchel-Moreau theorem which gives the following result.
(
dual pair) Let
f∈Γ0(C) then the following equivalence holds
(x∈C,y∈∂f(x))⟺x∈∂f⋆(y)∩C. Such a pair of points is called a
dual pair and verifies
f(x)+f⋆(y)=⟨x,y⟩.
Using the inverse of the subdifferential operator on the left hand side of (14), we have that x∈(∂f)−1(y)∩C is equivalent to x∈∂f⋆(y)∩C which exposes a close link between (∂f)−1 and ∂f⋆. This link becomes even more apparent in the unconstrained case.
Let
f∈Γ0(Rn), then
(∂f)−1≡∂f⋆. Note: recall that, with the first order condition, a minimiser x† is such that 0∈∂f(x†) or (∂f)−1(0)∋x†. We can now also write this x†∈∂f⋆(0), thereby expressing the minimiser in terms of the convex-conjugate. Of course it's not clear at this point whether this helps at all to find the minimiser but, as it turns out, it will.
Example: let's consider f(x)=xlog(x) on C=[0,1]. We know that f⋆(y)=exp(y−1). Both f and f⋆ are differentiable so ∂f(x)={∇f(x)} with ∇f(x)=logx+1 on C and ∂f⋆(y)={∇f⋆(y)} with ∇f⋆(y)=exp(y−1) on R.
Gathering the pieces, if we take x∈C and y=∇f(x), we have y=logx+1⟹x=exp(y−1)=∇f⋆(y) and conversely if we consider a y such that exp(y−1)∈C then x=exp(y−1)⟹y=logx+1=∇f(x) So that the equivalence (14) holds as expected.
Note: in this page (and also further) we usually consider C∘ instead of C to make the presentation simpler since we know that on C∘, subgradients exist and are bounded. Of course this is done at the expense of generality and some care is needed to generalise properly but the extra details did not seem particularly relevant at the level of this presentation. The references listed below show (and prove) the results in their full glory.
Hiriart-Urruty, A note on the Legendre-Fenchel transform of convex composite functions, 2006. – This is a more technical note that you may find interesting if you would like more details on convex conjugacy.
Mete Soner, Convex analysis and Fenchel-Moreau theorem, 2015. – Lecture notes covering convex conjugacy in more depth and proving rigorously the Fenchel-Moreau theorem.
See also the general references mentioned in the introduction.