Matrix inversion lemmas

linear algebra

The Woodbury formula is maybe one of the most ubiquitous trick in basic linear algebra: it starts with the explicit formula for the inverse of a block 2x2 matrix and results in identities that can be used in kernel theory, the Kalman filter, to combine multivariate normals etc.

In these notes I present a simple development leading to the Woodbury formula, and the special case of the Sherman-Morrison formula with some code to show those at work.

Partitioned matrix inversion
Revisiting the assumptions
1. Back to the development
A bit of code

Partitioned matrix inversion

Consider an invertible matrix $M$ made of blocks $A$ , $B$ , $C$ and $D$ with

M \quad\!\! =\quad\!\! \begin{pmatrix} A & B \\ C & D \end{pmatrix}

where both $A$ and $D$ are assumed to be square and invertible. The aim is to express the inverse of $M$ in terms of the blocks $A$ , $B$ , $C$ and $D$ and the inverse of $A$ and $D$ . We can write the inverse of $M$ using an identical structure:

M^{-1} \quad\!\! =\quad\!\! \begin{pmatrix} W & X \\ Y& Z \end{pmatrix},

for some $W$ , $X$ , $Y$ and $Z$ to be determined. Using $MM^{-1} = \mathbf I$ since $M$ is assumed to be invertible, we get the following system of equations:

\begin{cases} AW + BY &=\quad\!\! \mathbf I \\ AX + BZ &=\quad\!\! \mathbf 0 \\ CW + DY &=\quad\!\! \mathbf 0 \\ CX + DZ &=\quad\!\! \mathbf I \end{cases}

Left multiplying the first two equations by $A^{-1}$ and the last two by $D^{-1}$ and re-arranging a little gives

\begin{cases} (\mathbf I - A^{-1} B D^{-1} C)W &=\quad\!\! A^{-1}\\ (\mathbf I - D^{-1} CA^{-1} B)Z &=\quad\!\! D^{-1}\\ \end{cases} \quad\text{and}\quad \begin{cases} X &=\quad\!\! -A^{-1} B Z\\ Y &=\quad\!\! -D^{-1} C W \end{cases}

Let us assume that both $(\mathbf I - A^{-1} B D^{-1} C)$ and $(\mathbf I - D^{-1} CA^{-1} B)$ are invertible, then the first two equations can be further simplified using that $(E F)^{-1} = F^{-1} E^{-1}$ for invertible matrices $E$ and $F$ :

\begin{cases} W &=\quad\!\! (A - BD^{-1} C)^{-1} &=\quad\!\! (D^s)^{-1}\\ Z &=\quad\!\! (D - CA^{-1} B)^{-1} &=\quad\!\! (A^s)^{-1} \end{cases}

where the matrix $D^s := (A-BD^{-1} C)$ (resp. $A^s := (D-CA^{-1} B)$ ) are called the Schur-Complement of $D$ (resp. $A$ ). We will come back to these when revisiting the assumptions made in a subsequent point

We started with $MM^{-1} = \mathbf I$ but we could have started with $M^{-1} M=\mathbf I$ of course. This gives an equivalent result but the form obtained for $Y$ and $X$ is different:

\begin{cases} Y &=\quad\!\! -ZCA^{-1}\\ X &=\quad\!\! -WBD^{-1} \end{cases}

Basic lemmas

Equating the expressions in (4) and (6) for $Y$ gives $D^{-1} CW = ZCA^{-1}$ which, combined with (5) gives the first lemma.

(Lemma I) let $A$ and $D$ be square, invertible matrices of size $n_A\times n_A$ and $n_D\times n_D$ and $B$ and $C$ be matrices of size $n_A\times n_D$ and $n_D\times n_A$ , the following identity holds:

D^{-1} C (A-BD^{-1} C)^{-1} \quad\!\! =\quad\!\! (D - CA^{-1} B)^{-1} CA^{-1}.

Equating the expressions in (4) and (6) for $X$ gives $WBD^{-1} = A^{-1} B Z$ which, combined with (5) gives the second lemma.

(Lemma II) under the same assumptions as for Lemma I, the following identity holds:

(A - BD^{-1} C)^{-1} BD^{-1} \quad\!\! =\quad\!\! A^{-1} B (D - CA^{-1} B)^{-1}.

Woodbury formula

One little bit of dark magic is required to get the Woodbury formula: observe that if we take the term $(A-BD^{-1} C)$ and right-multiply it by $-A^{-1}$ we get

(A-BD^{-1} C)(-A^{-1}) \quad\!\! =\quad\!\! \textcolor{green}{BD^{-1}} \textcolor{blue}{CA^{-1}} - \mathbf I

and therefore $BD^{-1} CA^{-1} = (\mathbf I + (A-BD^{-1} C)(-A^{-1}))$ . Now if we post-multiply (8) by $CA^{-1}$ and re-arrange the expression, we get the third lemma.

(Lemma III) under the same assumptions as for Lemma I, the following identity holds:

(A-BD^{-1} C)^{-1} \quad\!\! =\quad\!\! A^{-1} + A^{-1} B(D-CA^{-1} B)^{-1} CA^{-1}.

of course the same gymnastics can be applied with the term $(D-CA^{-1} B)^{-1}$ to obtain a similar identity.

To obtain the classical Woodbury formula though, we just need to reassign letters with $E\leftarrow A$ , $F\leftarrow -B$ , $G\leftarrow D^{-1}$ and $H\leftarrow C$ . (So Lemma III is already the Woodbury formula, the re-assignment only leads to a somewhat more visually pleasing form)

(Woodbury formula) let $E$ , $G$ be square invertible matrices of dimensions $n_E \times n_E$ and $n_G\times n_G$ respectively, let $F$ and $H$ be matrices of size $n_E\times n_G$ and $n_G\times n_E$ respectively, then the following identity holds:

(E+FGH)^{-1} \quad\!\! =\quad\!\! E^{-1} - E^{-1} F(G^{-1} + HE^{-1} F)^{-1} H E^{-1}

Sherman-Morrison formula

Consider again (11) and let $G=1$ , $F=u$ and $H=v$ with $u, v\in\mathbb R^{n_E}$ then the formula gives

(E+uv^T)^{-1} \quad\!\! =\quad\!\! E^{-1} -{E^{-1} u v^T E^{-1}\over 1 + v^T E^{-1} u},

a useful expression for the inverse of a matrix combined with a rank-1 perturbation. This is used for instance in the development of the famous BFGS flavour of the Quasi-Newton iterations (see e.g. the wikipedia article).

Revisiting the assumptions

thanks to Christopher Yeh for the interesting discussion on this topic. Chris also wrote a post on this topic.

In the development above, we made a few assumptions to simplify the development, sometimes stronger than needed. We can now make those more precise without risking confusion.

Let us start with the same description of $M$ as in (1); we had introduced the Schur-Complement of $A$ as $A^s := (D - CA^{-1} B)$ , and that of $D$ as $D^s := (A - BD^{-1} C)$ .

With those definitions, we have the following set of properties:

(Theorem) Let $M$ be as in (1):

(1A) if both $M$ and $A$ are invertible then $A^s$ is invertible,
(1B) if both $M$ and $D$ are invertible then $D^s$ is invertible,
(2A) if $A$ and $A^s$ are invertible then $M$ is invertible,
(2B) if $D$ and $D^s$ are invertible then $M$ is invertible,
(3) if both $A$ and $D$ are invertible as well as one of $\{M, A^s, D^s\}$ , then they all are.

(1A - proof) take $z = (x_1, x_2)$ a vector of dimension compatible with $M$ and such that $A^s x_2 = 0$ so that $Dx_2 = CA^{-1} B x_2$ . Then, considering $Mz$ gives:

Mz = (Ax_1 + Bx_2, Cx_1 + CA^{-1} B x_2)

since $A$ is invertible, we can form $x_1 = -A^{-1} B x_2$ so that $Mz=0$ . Since $M$ is invertible, $z=0$ and thus necessarily $x_2=0$ so that $A^s$ is invertible. Proof of 1B is identical.

(2A - proof) let $z=(x_1, x_2)$ such that $Mz=0$ then

\begin{array}{rcl} Ax_1 + Bx_2 &=& 0\\ Cx_1 + Dx_2 &=& 0\end{array}

since $A$ is invertible, we can write $x_1 = -A^{-1} Bx_2$ and the second equation becomes

(D - CA^{-1} B)x_2 \quad\!\! =\quad\!\! 0

or $A^s x_2 = 0$ . But since $A^s$ is invertible, $x_2=0$ and therefore $x_1=0$ so that $z$ is necessarily $0$ and $M$ is invertible. Proof of 2B is identical.

(3 - proof) this is trivially implied by combining the previous properties.

Back to the development

In the development we were working under (3) with $A$ , $D$ and $M$ invertible (and therefore $A^s$ and $D^s$ as well). We had then made the assumption that $(\mathbf I-A^{-1} BC)$ and $(\mathbf I - D^{-1} CA^{-1})$ were invertible, but these are actually implied by the fact that respectively $D^s$ and $A^s$ are invertible. Indeed, taking the first one, if we take $z$ such that

(\mathbf I-A^{-1} B D^{-1} C)z \quad\!\! =\quad\!\! 0

and left-multiply by $A$ , we have

(A - BD^{-1} C)z \quad\!\! =\quad\!\! D^sz \quad\!\! = \quad\!\! 0

but since $D^s$ is invertible (by 3), $z=0$ so that $(\mathbf I-A^{-1} BD^{-1} C)$ is invertible (the second case is identical).

A bit of code

If you want to see these equations at work, here's a simple Julia script:

using Test
using LinearAlgebra: dot

# Woodbury formula
n_E, n_G = 13, 15;
E = randn(n_E, n_E);
F = randn(n_E, n_G);
G = randn(n_G, n_G);
H = randn(n_G, n_E);
iE, iG = inv(E), inv(G);
@test inv(E+F*G*H) ≈ iE - iE*F*inv(iG + H*iE*F)*H*iE

# Sherman-Morrison formula
n_E = 23;
E = randn(n_E, n_E);
u = randn(n_E);
v = randn(n_E);
iE = inv(E)
iEu = iE*u
@test inv(E + u*v') ≈ iE - (iEu*(v'*iE))/(1+dot(v, iEu))

(Recall that invertible matrices are dense among square matrices so that using randomly generated matrices for $E$ and $G$ is unlikely to cause problems).

Tim Holy has written a simple package for this called WoodburyMatrices.jl.

using WoodburyMatrices
W = Woodbury(E, F, G, H);
b = randn(n_E);
# using the package
s1 = W\b;
# hand-coding using the formula
iEb = iE*b;
s2  = iEb - iE*(F*((iG+H*iE*F)\(H*iEb)))
@test s1 ≈ s2