In the notes Γ0(C) denotes the set of proper and lsc convex functions on a non-empty convex set C⊆Rn. Remember that we assume C⊆domf the domain of the function of interest f.
a proper convex function f is finite value for at least one x∈C (i.e.: ∃x∈C,f(x)<∞) and is always lower bounded (i.e.: f(x)>−∞,∀x∈C).
a lsc (lower semi continuous) function is such that
f(x)≤limi→∞f(xi) for any sequence x1,x2,… in C that converges to x and such that the limit exists (see Rockafellar (1970) section 7). The figure below illustrates this (note that the functions are of course not convex).
For the purpose of these notes, we will always assume that f is a proper convex function and usually assume that it is also lsc.
Example: the function f(x)=∣x∣+i[−1,1](x) is in Γ0(R). Clearly it is a proper convex function on R and is lsc on R\{−1,1}. Then, it's easy to see that for any sequence x1,x2,… in R converging to 1 (resp. −1) we have limi→∞f(xi)=∞ or f(1) (resp. f(−1)) so that (1) holds.
We say that y∈Rn is a subgradient of the convex function f at x∈C if it verifies the following inequality:
f(z)≥f(x)+⟨z−x,y⟩,∀z∈C.
The inequality (2) simply indicates that the graph of the function f at x is supported by the hyperplane defined by the right-hand side. A subgradient is thus the "slope" of one such supporting hyperplane.
The function is differentiable at x if and only if there is a unique subgradient at x (the classical gradient ∇f(x)) and, correspondingly, only one supporting hyperplane:
f(z)≥f(x)+⟨z−x,∇f(x)⟩,∀z∈C.
However, if the function is not differentiable at x (e.g., if there is a kink at x) then there may be infinitely many supporting hyperplanes and infinitely many subgradients.
The set of subgradients of a convex function f at a point x∈domf is called the subdifferential of f and denoted ∂f(x). For a proper convex function f, it can be shown that the subdifferential of f is a non-empty bounded set at any point x∈(domf)∘ (Rockafellar (1970), theorem 23.4).
Note that since we have assumed that C⊆domf, then C∘⊆(domf)∘ and therefore ∂f is non-empty and bounded on C∘.
An example is the absolute value function f(x)=∣x∣ which is not differentiable at 0. It is however supported at that point by all lines of the form ℓα(x)=αx with α∈[−1,1] (see the dashed lines on the figure below). The set [−1,1] is therefore the subdifferential of the function at 0, denoted ∂f(0).
A point x†∈C is a minimiser of the function f if and only if f(z)≥f(x†) for all z∈C. This can be written equivalently as:
f(z)≥f(x†)+⟨z−x,0⟩,∀z∈C,
and hence 0 must be a subgradient of f at x†.
First-order optimality condition (FOC): for a proper convex function f,
x†∈argx∈Cminf(x)⟺0∈∂f(x†).
Note: some care must be taken when x†∈arginfx∈Cf(x) is on the boundary of C as there may not be a subgradient there, this is why we had originally assumed that f is minimised on the interior of C. We will come back to this when discussing optimisation methods for constrained problems such as the projected gradient descent.
If we take the subdifferential as an operator then, intuitively, looking for a minimiser amounts to "inverting" the subdifferential and evaluating it at 0, i.e.: x†=(∂f)−1(0). Of course at this point we don't know how to compute (∂f)−1 in general. We shall come back to this in more details but the idea of inverting an operator involving the subdifferential to find the minimiser is key in convex optimisation.
Note that in some simple situations the FOC is sufficient to immediately find a minimiser, for instance, if f(x)=∣x∣, then clearly the subdifferential:
∂f(x)=⎩⎨⎧sign(x)[−1,1](x=0)(x=0)
which shows that the only point x† where 0∈∂f(x†) is x†=0. In other words, (∂f)−1(0)=0.
Let fi:C→R be proper convex functions then ∂∑ifi⊇∑i∂fi.
Indeed, let g≡∑ifi and let yi∈∂fi(x) then, by definition,
fi(z)≥fi(x)+⟨z−x,yi⟩,∀z∈C, and we can sum across these inequalities to get g(z)≥g(x)+⟨z−x,∑iyi⟩,∀z∈C, so that ∑iyi∈∂g(x). Note that if 0∈∑i∂fi(x†) then the inclusion implies 0∈∂∑ifi(x†) which is sufficient to show that x† is a minimiser.
Boyd and Vandenberghe, Subgradients, 2008. – Accessible lecture notes introducing the subgradient and proving that the subdifferential of a convex function is non-empty and closed at any point in the interior of the domain of the function.