Convex analysis (part I)


In the notes Γ0(C)\Gamma_0(C) denotes the set of proper and lsc convex functions on a non-empty convex set CRn.C\subseteq \mathbb R^n. Remember that we assume CdomfC\subseteq \mathrm{dom} f the domain of the function of interest ff.

f(x)limif(xi)\begin{array}{rcl} f(x)&\le& \lim_{i\to \infty} f(x_i) \end{array} for any sequence x1,x2,x_1, x_2, \dots in CC that converges to xx and such that the limit exists (see Rockafellar (1970) section 7). The figure below illustrates this (note that the functions are of course not convex).

For the purpose of these notes, we will always assume that ff is a proper convex function and usually assume that it is also lsc.

Example: the function f(x)=x+i[1,1](x)f(x)=|x|+i_{[-1,1]}(x) is in Γ0(R)\Gamma_0(\mathbb R). Clearly it is a proper convex function on R\mathbb R and is lsc on R\{1,1}\mathbb R\backslash\{-1, 1\}. Then, it's easy to see that for any sequence x1,x2,x_1,x_2,\dots in R\mathbb R converging to 11 (resp. 1-1) we have limif(xi)=\lim_{i\to\infty} f(x_i)=\infty or f(1)f(1) (resp. f(1)f(-1)) so that (1) holds.

Subgradient, subdifferential and FOC

We say that yRny\in\mathbb R^n is a subgradient of the convex function ff at xCx\in C if it verifies the following inequality:

f(z)f(x)+zx,y,zC.\begin{array}{rcl} f(z) &\ge & f(x) + \langle z-x, y \rangle, \qquad \forall z\in C. \end{array}

The inequality (2) simply indicates that the graph of the function ff at xx is supported by the hyperplane defined by the right-hand side. A subgradient is thus the "slope" of one such supporting hyperplane.

The function is differentiable at xx if and only if there is a unique subgradient at xx (the classical gradient f(x)\nabla f(x)) and, correspondingly, only one supporting hyperplane:

f(z)f(x)+zx,f(x),zC.\begin{array}{rcl} f(z) &\ge& f(x) + \left\langle z-x, \nabla f(x)\right\rangle, \qquad \forall z \in C.\end{array}

However, if the function is not differentiable at xx (e.g., if there is a kink at xx) then there may be infinitely many supporting hyperplanes and infinitely many subgradients.

The set of subgradients of a convex function ff at a point xdom fx\in \mathrm{dom}\, f is called the subdifferential of ff and denoted f(x)\partial f(x). For a proper convex function ff, it can be shown that the subdifferential of ff is a non-empty bounded set at any point x(dom f)x\in (\mathrm{dom}\,f)^\circ (Rockafellar (1970), theorem 23.4).

Note that since we have assumed that Cdom fC\subseteq \mathrm{dom}\,f, then C(dom f)C^\circ\subseteq (\mathrm{dom}\,f)^\circ and therefore f\partial f is non-empty and bounded on CC^\circ.

An example is the absolute value function f(x)=xf(x)=|x| which is not differentiable at 00. It is however supported at that point by all lines of the form α(x)=αx\ell_\alpha(x)=\alpha x with α[1,1]\alpha\in [-1,1] (see the dashed lines on the figure below). The set [1,1][-1, 1] is therefore the subdifferential of the function at 00, denoted f(0)\partial f(0).

First order optimality condition (FOC)

A point xCx^\dagger\in C is a minimiser of the function ff if and only if f(z)f(x)f(z)\ge f(x^\dagger) for all zCz\in C. This can be written equivalently as:

f(z)f(x)+zx,0,zC,\begin{array}{rcl} f(z) &\ge& f(x^\dagger) + \left\langle z-x, 0\right\rangle, \qquad \forall z \in C,\end{array}

and hence 00 must be a subgradient of ff at xx^\dagger.

First-order optimality condition (FOC): for a proper convex function ff,

x argminxC f(x) ⁣ ⁣ ⁣ ⁣0 f(x). x^\dagger \,\in\, \arg\min_{x\in C} \, f(x) \quad\!\! \Longleftrightarrow\quad\!\! 0\,\in\, \partial f(x^\dagger).

Note: some care must be taken when xarginfxCf(x)x^\dagger \in \arg\inf_{x\in C} f(x) is on the boundary of CC as there may not be a subgradient there, this is why we had originally assumed that ff is minimised on the interior of CC. We will come back to this when discussing optimisation methods for constrained problems such as the projected gradient descent.

If we take the subdifferential as an operator then, intuitively, looking for a minimiser amounts to "inverting" the subdifferential and evaluating it at 00, i.e.: x=(f)1(0)x^\dagger = (\partial f)^{-1}(0). Of course at this point we don't know how to compute (f)1(\partial f)^{-1} in general. We shall come back to this in more details but the idea of inverting an operator involving the subdifferential to find the minimiser is key in convex optimisation.

Note that in some simple situations the FOC is sufficient to immediately find a minimiser, for instance, if f(x)=xf(x)=|x|, then clearly the subdifferential:

f(x)={sign(x)(x0)[1,1](x=0)\begin{array}{rcl} \partial f(x) &=& \begin{cases} \mathrm{sign}(x) & (x\neq 0) \\\\ [-1, 1] & (x=0) \end{cases}\end{array}

which shows that the only point xx^\dagger where 0f(x)0\in \partial f(x^\dagger) is x=0x^\dagger=0. In other words, (f)1(0)=0(\partial f)^{-1}(0) = 0.

Subdifferential of a sum

Let fi:CRf_i:C\to \mathbb R be proper convex functions then ifiifi.\begin{array}{rcl} \partial \sum_i f_i &\supseteq& \sum_i \partial f_i. \end{array}

Indeed, let gifig\equiv\sum_i f_i and let yifi(x)y_i\in\partial f_i(x) then, by definition,

fi(z)fi(x)+zx,yi,zC,\begin{array}{rcl} f_i(z) &\ge& f_i(x) + \left\langle z-x, y_i\right\rangle, \quad\forall z\in C,\end{array} and we can sum across these inequalities to get g(z)g(x)+zx,iyi,zC,\begin{array}{rcl} g(z) &\ge& g(x) + \langle z-x, \sum_i y_i\rangle, \quad\forall z\in C,\end{array} so that iyig(x)\sum_i y_i \in \partial g(x). Note that if 0ifi(x)0\in \sum_i \partial f_i(x^\dagger) then the inclusion implies 0ifi(x)0\in\partial \sum_i f_i(x^\dagger) which is sufficient to show that xx^\dagger is a minimiser.

Additional references

  1. Boyd and Vandenberghe, Subgradients, 2008. – Accessible lecture notes introducing the subgradient and proving that the subdifferential of a convex function is non-empty and closed at any point in the interior of the domain of the function.

  2. Rockafellar: Convex analysis, 1970.

See also the general references mentioned in the introduction.