Introduction to convex optimisation

We'll consider the standard constrained minimisation problem in convex optimisation:

minxCf(x) \min_{x\in C}\quad f(x)

where CC is a non-empty convex subset of Rn\mathbb R^n and ff a nice convex function. The following notions of nice are used (unless explicitly specified):

Roughly speaking, these conditions guarantee that there is a solution to the problem and that we can find one applying some simple iterative algorithm. We will come back to these assumptions as we get to use them throughout the notes.

We will also consider the unconstrained form of the problem, i.e.: when C=RnC=\mathbb R^n (and will then just write minxf(x)\min_x f(x)). Constrained problems can always be interpreted as unconstrained problems: indeed, if we define the indicator of a convex set CC as

iC(x):={0(xC)+(xC)\begin{array}{rcl} i_C(x) &:=& \begin{cases} 0 & (x\in C) \\\\ +\infty & (x\notin C) \end{cases}\end{array}

then the constrained problem (1) is equivalent to

minxRnf(x)+iC(x). \min_{x\in \mathbb R^n} \quad f(x)+i_C(x).

This is not entirely pointless as will become apparent when deriving the projected gradient descent.

Iterative methods

Before delving into the details it's useful to understand how algorithms for optimisation can often be constructed.

A big part of convex optimisation aims at defining clever iterative algorithms which, ideally, enjoy the following properties when started from a sensible initial point x0x_0:

Often these iterative algorithms can be derived from some kind of fixed point equation that is satisfied by a minimiser xx^\dagger, i.e.: an equation of the form

x=P(x)\begin{array}{rcl} x^\dagger &=& P(x^\dagger)\end{array}

where PP is an appropriate operator. Provided we have such a fixed point equation, we can consider a fixed point algorithm with the simplest form being:

xk+1=P(xk).\begin{array}{rcl} x_{k+1} &=& P(x_k).\end{array}

Under some conditions on the operator PP and possibly on x0x_0, such an algorithm will provably converge to xx^\dagger. This may seem reasonably straightforward but there are quite a few difficult questions that need be addressed and will be investigated in the notes:

In the rest of these notes, we will show how to obtain the fixed point equations and useful fixed point algorithms for a variety of scenarios and, by doing so, will recover well known algorithms such as the classical gradient descent and the mirror descent.

General references

More precise pointers will be given on subsequent pages but most of the content in these notes relates in some way or another to the following references (in particular the first one):

  1. Rockafellar: Convex analysis, 1970. – A "must-read" in convex analysis, a reference of choice for technical details.

  2. Nesterov: Introductory Lectures on Convex Optimization, 1998. – Another must-read with all the technical details and more focused on algorithms and convergence rates than Rockafellar's book.

  3. Ben-Tal, Nemirovski, Lectures on Modern Optimization, 2013. – A recent perspective on optimisation methods.

  4. Boyd and Vandenberghe: Convex Optimization, 2004. – More accessible and more oriented towards giving "tools" to the reader than the previous references.