Splitting methods and ADMM

admm, optimisation, proximal method, splitting methods

In these notes, I show how some well known methods from numerical linear algebra can be applied to convex optimisation. The aim of these notes is to give an idea for how the following topics intertwine:

solving a system of linear equations via iterative methods
operator splitting techniques (Gauss-Seidel, Douglas-Rachford, ...),
proximal operator,
the alternating direction methods of multipliers also known as ADM or ADMM

From convex optimisation to linear algebra
From linear algebra back to convex optimisation
1. Proximal operators
2. ADMM
References

From convex optimisation to linear algebra

Decomposable problems

In machine learning, imaging etc, a large portion of the convex optimisation problems can be written in the form:

x^\dagger \quad\!\! \in\quad\!\! \arg\min_x\,\, f(x) + g(x).

This includes constrained problems where $g\equiv \iota_C$ for a convex set $C$ or penalised problems like the LASSO regression:

x^\dagger \quad\!\! \in\quad\!\!\arg\min_x\,\, \frac12\|Ax-b\|_2^2 + \lambda\|x\|_1

In a similar vein as for the previous notes, the following regularity conditions are usually assumed to hold:

$f, g\in \Gamma_0$ , the space of convex, proper, lower semi-continuous functions,
$f, g$ are such that on $\mathrm{dom}\, f \cap \mathrm{dom}\, g$ , $\partial(f+g)=\partial f+\partial g$ .

As we showed in convex analysis part 1, for $x^\dagger$ to be a minimiser, it must verify the first order condition, i.e.:

0 \quad\!\! \in\quad\!\! (\partial f+\partial g)(x^\dagger),

the issue being that, in most cases, we don't have a (cheaply available) closed-form expression for the inverse operator (otherwise the problem is trivial).

This issue can in fact be related to the classical problem of solving a linear system of equations:

b \quad\!\! =\quad\!\! Ax

where $A$ is invertible but is, for instance, too big or too poorly conditioned for its inverse to be computable cheaply and reliably.

Operator splitting methods in linear algebra

One way of attempting to solve (4) without computing the inverse of $A$ is to consider a splitting method: a decomposition of $A$ into a sum of matrices $A=B+C$ with desirable properties. The equation (4) can then be written in the form of a fixed-point equation:

Bx \quad\!\! =\quad\!\! b-Cx.

Assuming that $B$ is easier to invert than $A$ , we can consider the fixed-point iteration algorithm:

x_{k+1} \quad\!\! =\quad\!\! B^{-1} (b-Cx_k).

There are two classical examples of this type of splitting:

the Jacobi method, writing $A=D+R$ with $D$ diagonal,
the Gauss-Seidel method, writing $A=(L+D)+U$ with $L$ and $U$ lower and upper triangular respectively.

Under some conditions, the corresponding fixed-point iterations converge (see also Ortega and Rheinboldt (2000)). For instance if $A$ is symmetric, positive definite or if it is diagonally dominant then Gauss-Seidel converges.

DPR splitting

Researchers like Douglas, Peaceman and Rachford studied this in the mid 1950s to solve linear systems arising from the discretisation of systems of partial differential equations (Peaceman and Rachford (1955), Douglas and Rachford (1956)). They came up with what is now known as the Douglas-Rachford splitting and the Peaceman-Rachford splitting.

The context is simple: assume that we have a decomposition $A=B+C$ where $B$ and/or $C$ are poorly conditioned or even singular. In that case, one can try to regularise them by writing

A \quad\!\! =\quad\!\! (B+\alpha \mathbf I) + (C-\alpha \mathbf I)

for some $\alpha>0$ which will shift the minimum singular value of $B$ and $C$ away from zero (and thereby push them towards diagonally dominant matrices). The fixed-point corresponding to this split is

(B+\alpha \mathbf I) x \quad\!\! =\quad\!\! b-(C-\alpha\mathbf I)x.

Observe that for a suitably large $\alpha$ we could also consider the fixed-point derived from (8) where the role of $B$ and $C$ are swapped. The resulting fixed point equation is equivalent to (8) but the fixed-point iteration is not and the DPR method suggests alternating between both.

(DPR iterative method) let $Ax=b$ and $A=B+C$ , the DPR iterative method is given by

\begin{cases} (B+\alpha\mathbf I)x_{k+1} &=\quad\!\! (b+(\alpha \mathbf I - C)z_k)\\ (C+\alpha\mathbf I)z_{k+1} &=\quad\!\! (b+(\alpha \mathbf I-B)x_{k+1}) \end{cases}

and converges to the solution provided $\alpha$ is sufficiently large.

This method belongs to a class of method known as alternating direction methods...

DPR splitting for the kernel

Consider now the kernel problem i.e. finding $x$ such that $Ax=0$ (i.e. $b=0$ ) still with $A=(B+C)$ . Let $y=Cx=-Bx$ then we can consider a triplet of fixed points:

\begin{cases} (B+\alpha\mathbf I) x &=\quad\!\! (\alpha\mathbf I - C)x + \textcolor{blue}{(Cx-y)}\\ (C+\alpha\mathbf I)x &=\quad\!\! (\alpha\mathbf I - B)x - \textcolor{blue}{(-Bx-y)}\\ \textcolor{blue}{y} &=\quad\!\! \textcolor{blue}{Cx} \end{cases}

We can then intertwine the corresponding fixed-point iterations as follows:

\begin{cases} (B+\alpha\mathbf I) x_{k+1} &=\quad\!\! (\alpha\mathbf I - C)z_k + (Cz_k-y_k)\\ (C+\alpha\mathbf I) z_{k+1} &=\quad\!\! (\alpha\mathbf I - B)x_{k+1} - (-Bx_{k+1}-y_k)\\ \textcolor{blue}{y_{k+1}} &=\quad\!\! Cz_{k+1} \end{cases}

Let now $u_k=y_k/\alpha$ and note that $Cz_{k+1}= (\alpha x_{k+1} + y_k - \alpha z_{k+1}) = \alpha(x_{k+1} + u_k - z_{k+1})$ using the second iteration. This leads to an iterative method to solve $Ax=0$ which we will show to be directly connected to the ADMM.

(DPR iterative method (2)) let $Ax=0$ and $A=B+C$ , the DPR2 iterative method is given by

\begin{cases} (B+\alpha\mathbf I)x_{k+1} &=\quad\!\! \alpha(z_k - u_k)\\ (C+\alpha\mathbf I)z_{k+1} &=\quad\!\! \alpha(z_k + u_k)\\ u_{k+1} &=\quad\!\! u_k + x_{k+1} - z_{k+1} \end{cases}

and converges to the solution provided $\alpha$ is sufficiently large.

From linear algebra back to convex optimisation

Going back to problem (2), we had noted that a minimiser must be in the kernel of $(\partial f+\partial g)$ :

0 \quad\!\! \in\quad\!\! (\partial f+\partial g)(x^\dagger).

Since we've just seen that splitting operators could be a good idea in linear algebra, we could be tempted to apply exactly the same approach here. But in order to do this, we need to consider the inverse of the following two operators: $(\partial f+\alpha \mathbf I)$ and $(\partial g+\alpha \mathbf I)$ .

Proximal operators

Proximal operators can be recovered from a number of nice perspectives and are usually attributed to Moreau (see e.g. Moreau (1965)). Here we'll just cover it briefly aiming to define the prox of a function $f$ denoted $\mathrm{prox}_f$ and show a key result, i.e.: that $\mathrm{prox}_f \equiv (\partial f+\mathbf I)^{-1}$ .

Let $x$ and $z$ be such that $z \in (\partial f + \mathbf I)(x)$ . We are interested in the inverse map or, in other words, in having $x$ in terms of $z$ . Rearranging the equation note that we have

0 \quad\!\! \in\quad\!\! \partial f(x) + (x-z).

Observe that the simple linear functional $(x-z)$ can be re-expressed as the gradient of a squared $\ell^2$ norm:

\partial \left[\frac12\|x-z\|_2^2\right] \quad\!\! =\quad\!\! x-z.

Therefore, we can write (14) as

0 \quad\!\! \in\quad\!\! \partial \left[f + {1\over 2}\|\cdot-z\|_2^2\right] (x).

This can be interpreted as a first order condition (FOC) and is equivalent to

x \quad\!\! \in\quad\!\! \arg\min_u \, f(u)+{1\over 2}\|u-z\|_2^2

which defines the prox of $f$ .

For a convex function $f$ , the proximal operator of $f$ at a point $z$ is defined as

\mathrm{prox}_f(z) \quad\!\! =\quad\!\! \arg\min_u \, f(u)+\frac12\|u-z\|_2^2

and is such that $\mathrm{prox}_f \equiv (\partial f + \mathbf I)^{-1}$ .

Note that $(\partial f+\alpha \mathbf I) = \alpha (\partial (\alpha^{-1} f)+\mathbf I)$ so that

\alpha(\partial f + \alpha \mathbf I)^{-1} \quad\!\! \equiv\quad\!\! \mathrm{prox}_{\alpha^{-1} f}.

Note also that if $\alpha$ is sufficiently large, then the objective in (17) is strongly-convex and therefore can only have a unique minimiser meaning that $\mathrm{prox}_{\alpha^{-1} f}$ is then a well-defined function.

Remark: it may look like we just conjured this proximal operator out of the abyss for nothing but it turns out that a proximal operator exists in closed form for a number of important functions. Among the most known examples is the $\ell^1$ -norm whose prox is the soft-thresholding operator and the $\iota_C$ indicator of a convex set whose proximal operator is the orthogonal projection on that set.

ADMM

Hopefully you saw this one coming: if you take DPR2 (12) and simply replace $B$ by $\partial f$ , $C$ by $\partial g$ and pepper with $\mathrm{prox}$ you get the ADMM (see e.g. Combettes and Pesquet (2011)).

(Alternative direction method of multipliers (ADMM)) the minimisation problem (2) can be tackled with the following elegant iteration:

\begin{cases} x_{k+1} &=\quad\!\! \mathrm{prox}_{\gamma f}(z_k-u_k)\\ z_{k+1} &=\quad\!\! \mathrm{prox}_{\gamma g}(x_{k+1}+u_k)\\ u_{k+1} &=\quad\!\! u_k + x_{k+1} - z_{k+1} \end{cases}

which converges provided $\gamma>0$ is small enough.

When is this helpful?: a frequent scenario has $f$ complex but differentiable and $g$ simple but non-differentiable (e.g. $\ell^1$ -norm); in that case, the first prox is a differentiable problem that can be (approximately) solved using a simple/cheap first-order method and the second prox exists in closed form. For instance, regularised maximum likelihood estimation or regularised inverse problems typically have this form.

References

Proximal methods

Combettes and Pesquet, Proximal splitting methods in signal processing, 2011. – A detailed review on proximal methods, accessible and comprehensive.
Moreau, Proximité et dualité dans un espace hilbertien, 1965. – A wonderful seminal paper, clear and complete, a great read if you understand French (and even if you don't you should be able to follow the equations).

Linear algebra

Peaceman and Rachford, The numerical solution of parabolic and elliptic differential equations, 1955.
Douglas and Rachford, On the numerical solution of heat conduction problems in two and three space variables, 1956.
Ortega and Rheinboldt, Iterative solutions of nonlinear equations in several variables, 2000.