Convex analysis (part I)
In the notes denotes the set of proper and lsc convex functions on a non-empty convex set Remember that we assume the domain of the function of interest .
a proper convex function is finite value for at least one (i.e.: ) and is always lower bounded (i.e.: ).
a lsc (lower semi continuous) function is such that
for any sequence in that converges to and such that the limit exists (see Rockafellar (1970) section 7). The figure below illustrates this (note that the functions are of course not convex).
For the purpose of these notes, we will always assume that is a proper convex function and usually assume that it is also lsc.
Example: the function is in . Clearly it is a proper convex function on and is lsc on . Then, it's easy to see that for any sequence in converging to (resp. ) we have or (resp. ) so that (1) holds.
Subgradient, subdifferential and FOC
The inequality (2) simply indicates that the graph of the function at is supported by the hyperplane defined by the right-hand side. A subgradient is thus the "slope" of one such supporting hyperplane.
The function is differentiable at if and only if there is a unique subgradient at (the classical gradient ) and, correspondingly, only one supporting hyperplane:
However, if the function is not differentiable at (e.g., if there is a kink at ) then there may be infinitely many supporting hyperplanes and infinitely many subgradients.
Note that since we have assumed that , then and therefore is non-empty and bounded on .
An example is the absolute value function which is not differentiable at . It is however supported at that point by all lines of the form with (see the dashed lines on the figure below). The set is therefore the subdifferential of the function at , denoted .
First order optimality condition (FOC)
A point is a minimiser of the function if and only if for all . This can be written equivalently as:
and hence must be a subgradient of at .
First-order optimality condition (FOC): for a proper convex function ,
Note: some care must be taken when is on the boundary of as there may not be a subgradient there, this is why we had originally assumed that is minimised on the interior of . We will come back to this when discussing optimisation methods for constrained problems such as the projected gradient descent.
If we take the subdifferential as an operator then, intuitively, looking for a minimiser amounts to "inverting" the subdifferential and evaluating it at , i.e.: . Of course at this point we don't know how to compute in general. We shall come back to this in more details but the idea of inverting an operator involving the subdifferential to find the minimiser is key in convex optimisation.
Note that in some simple situations the FOC is sufficient to immediately find a minimiser, for instance, if , then clearly the subdifferential:
which shows that the only point where is . In other words, .
Subdifferential of a sum
Indeed, let and let then, by definition,
and we can sum across these inequalities to get so that . Note that if then the inclusion implies which is sufficient to show that is a minimiser.
Additional references
Boyd and Vandenberghe, Subgradients, 2008. – Accessible lecture notes introducing the subgradient and proving that the subdifferential of a convex function is non-empty and closed at any point in the interior of the domain of the function.
Rockafellar: Convex analysis, 1970.
See also the general references mentioned in the introduction.