In these notes, I show how some well known methods from numerical linear algebra can be applied to convex optimisation. The aim of these notes is to give an idea for how the following topics intertwine:
solving a system of linear equations via iterative methods
operator splitting techniques (Gauss-Seidel, Douglas-Rachford, ...),
proximal operator,
the alternating direction methods of multipliers also known as ADM or ADMM
In machine learning, imaging etc, a large portion of the convex optimisation problems can be written in the form:
This includes constrained problems where for a convex set or penalised problems like the LASSO regression:
In a similar vein as for the previous notes, the following regularity conditions are usually assumed to hold:
, the space of convex, proper, lower semi-continuous functions,
are such that on , .
As we showed in convex analysis part 1, for to be a minimiser, it must verify the first order condition, i.e.:
the issue being that, in most cases, we don't have a (cheaply available) closed-form expression for the inverse operator (otherwise the problem is trivial).
This issue can in fact be related to the classical problem of solving a linear system of equations:
where is invertible but is, for instance, too big or too poorly conditioned for its inverse to be computable cheaply and reliably.
One way of attempting to solve (4) without computing the inverse of is to consider a splitting method: a decomposition of into a sum of matrices with desirable properties. The equation (4) can then be written in the form of a fixed-point equation:
Assuming that is easier to invert than , we can consider the fixed-point iteration algorithm:
There are two classical examples of this type of splitting:
the Jacobi method, writing with diagonal,
the Gauss-Seidel method, writing with and lower and upper triangular respectively.
Under some conditions, the corresponding fixed-point iterations converge (see also Ortega and Rheinboldt (2000)). For instance if is symmetric, positive definite or if it is diagonally dominant then Gauss-Seidel converges.
Researchers like Douglas, Peaceman and Rachford studied this in the mid 1950s to solve linear systems arising from the discretisation of systems of partial differential equations (Peaceman and Rachford (1955), Douglas and Rachford (1956)). They came up with what is now known as the Douglas-Rachford splitting and the Peaceman-Rachford splitting.
The context is simple: assume that we have a decomposition where and/or are poorly conditioned or even singular. In that case, one can try to regularise them by writing
for some which will shift the minimum singular value of and away from zero (and thereby push them towards diagonally dominant matrices). The fixed-point corresponding to this split is
Observe that for a suitably large we could also consider the fixed-point derived from (8) where the role of and are swapped. The resulting fixed point equation is equivalent to (8) but the fixed-point iteration is not and the DPR method suggests alternating between both.
(DPR iterative method) let and , the DPR iterative method is given by
and converges to the solution provided is sufficiently large.
This method belongs to a class of method known as alternating direction methods...
Consider now the kernel problem i.e. finding such that (i.e. ) still with . Let then we can consider a triplet of fixed points:
We can then intertwine the corresponding fixed-point iterations as follows:
Let now and note that using the second iteration. This leads to an iterative method to solve which we will show to be directly connected to the ADMM.
(DPR iterative method (2)) let and , the DPR2 iterative method is given by
and converges to the solution provided is sufficiently large.
Going back to problem (2), we had noted that a minimiser must be in the kernel of :
Since we've just seen that splitting operators could be a good idea in linear algebra, we could be tempted to apply exactly the same approach here. But in order to do this, we need to consider the inverse of the following two operators: and .
Proximal operators can be recovered from a number of nice perspectives and are usually attributed to Moreau (see e.g. Moreau (1965)). Here we'll just cover it briefly aiming to define the prox of a function denoted and show a key result, i.e.: that .
Let and be such that . We are interested in the inverse map or, in other words, in having in terms of . Rearranging the equation note that we have
Observe that the simple linear functional can be re-expressed as the gradient of a squared norm:
Therefore, we can write (14) as
This can be interpreted as a first order condition (FOC) and is equivalent to
which defines the prox of .
For a convex function , the proximal operator of at a point is defined as
and is such that .
Note that so that
Note also that if is sufficiently large, then the objective in (17) is strongly-convex and therefore can only have a unique minimiser meaning that is then a well-defined function.
Remark: it may look like we just conjured this proximal operator out of the abyss for nothing but it turns out that a proximal operator exists in closed form for a number of important functions. Among the most known examples is the -norm whose prox is the soft-thresholding operator and the indicator of a convex set whose proximal operator is the orthogonal projection on that set.
Hopefully you saw this one coming: if you take DPR2 (12) and simply replace by , by and pepper with you get the ADMM (see e.g. Combettes and Pesquet (2011)).
(Alternative direction method of multipliers (ADMM)) the minimisation problem (2) can be tackled with the following elegant iteration:
which converges provided is small enough.
When is this helpful?: a frequent scenario has complex but differentiable and simple but non-differentiable (e.g. -norm); in that case, the first prox is a differentiable problem that can be (approximately) solved using a simple/cheap first-order method and the second prox exists in closed form. For instance, regularised maximum likelihood estimation or regularised inverse problems typically have this form.
Proximal methods
Combettes and Pesquet, Proximal splitting methods in signal processing, 2011. – A detailed review on proximal methods, accessible and comprehensive.
Moreau, Proximité et dualité dans un espace hilbertien, 1965. – A wonderful seminal paper, clear and complete, a great read if you understand French (and even if you don't you should be able to follow the equations).
Linear algebra
Peaceman and Rachford, The numerical solution of parabolic and elliptic differential equations, 1955.
Douglas and Rachford, On the numerical solution of heat conduction problems in two and three space variables, 1956.
Ortega and Rheinboldt, Iterative solutions of nonlinear equations in several variables, 2000.