The Woodbury formula is maybe one of the most ubiquitous trick in basic linear algebra: it starts with the explicit formula for the inverse of a block 2x2 matrix and results in identities that can be used in kernel theory, the Kalman filter, to combine multivariate normals etc.
In these notes I present a simple development leading to the Woodbury formula, and the special case of the Sherman-Morrison formula with some code to show those at work.
Consider an invertible matrix made of blocks , , and with
where both and are assumed to be square and invertible. The aim is to express the inverse of in terms of the blocks , , and and the inverse of and . We can write the inverse of using an identical structure:
for some , , and to be determined. Using since is assumed to be invertible, we get the following system of equations:
Left multiplying the first two equations by and the last two by and re-arranging a little gives
Let us assume that both and are invertible, then the first two equations can be further simplified using that for invertible matrices and :
where the matrix (resp. ) are called the Schur-Complement of (resp. ). We will come back to these when revisiting the assumptions made in a subsequent point
We started with but we could have started with of course. This gives an equivalent result but the form obtained for and is different:
Equating the expressions in (4) and (6) for gives which, combined with (5) gives the first lemma.
(Lemma I) let and be square, invertible matrices of size and and and be matrices of size and , the following identity holds:
Equating the expressions in (4) and (6) for gives which, combined with (5) gives the second lemma.
One little bit of dark magic is required to get the Woodbury formula: observe that if we take the term and right-multiply it by we get
and therefore . Now if we post-multiply (8) by and re-arrange the expression, we get the third lemma.
of course the same gymnastics can be applied with the term to obtain a similar identity.
To obtain the classical Woodbury formula though, we just need to reassign letters with , , and . (So Lemma III is already the Woodbury formula, the re-assignment only leads to a somewhat more visually pleasing form)
(Woodbury formula) let , be square invertible matrices of dimensions and respectively, let and be matrices of size and respectively, then the following identity holds:
Consider again (11) and let , and with then the formula gives
a useful expression for the inverse of a matrix combined with a rank-1 perturbation. This is used for instance in the development of the famous BFGS flavour of the Quasi-Newton iterations (see e.g. the wikipedia article).
thanks to Christopher Yeh for the interesting discussion on this topic. Chris also wrote a post on this topic.
In the development above, we made a few assumptions to simplify the development, sometimes stronger than needed. We can now make those more precise without risking confusion.
Let us start with the same description of as in (1); we had introduced the Schur-Complement of as , and that of as .
With those definitions, we have the following set of properties:
(Theorem) Let be as in (1):
(1A) if both and are invertible then is invertible,
(1B) if both and are invertible then is invertible,
(2A) if and are invertible then is invertible,
(2B) if and are invertible then is invertible,
(3) if both and are invertible as well as one of , then they all are.
(1A - proof) take a vector of dimension compatible with and such that so that . Then, considering gives:
since is invertible, we can form so that . Since is invertible, and thus necessarily so that is invertible. Proof of 1B is identical.
(2A - proof) let such that then
since is invertible, we can write and the second equation becomes
or . But since is invertible, and therefore so that is necessarily and is invertible. Proof of 2B is identical.
(3 - proof) this is trivially implied by combining the previous properties.
In the development we were working under (3) with , and invertible (and therefore and as well). We had then made the assumption that and were invertible, but these are actually implied by the fact that respectively and are invertible. Indeed, taking the first one, if we take such that
and left-multiply by , we have
but since is invertible (by 3), so that is invertible (the second case is identical).
If you want to see these equations at work, here's a simple Julia script:
using Test
using LinearAlgebra: dot
# Woodbury formula
n_E, n_G = 13, 15;
E = randn(n_E, n_E);
F = randn(n_E, n_G);
G = randn(n_G, n_G);
H = randn(n_G, n_E);
iE, iG = inv(E), inv(G);
@test inv(E+F*G*H) ≈ iE - iE*F*inv(iG + H*iE*F)*H*iE
# Sherman-Morrison formula
n_E = 23;
E = randn(n_E, n_E);
u = randn(n_E);
v = randn(n_E);
iE = inv(E)
iEu = iE*u
@test inv(E + u*v') ≈ iE - (iEu*(v'*iE))/(1+dot(v, iEu))
(Recall that invertible matrices are dense among square matrices so that using randomly generated matrices for and is unlikely to cause problems).
Tim Holy has written a simple package for this called WoodburyMatrices.jl
.
using WoodburyMatrices
W = Woodbury(E, F, G, H);
b = randn(n_E);
# using the package
s1 = W\b;
# hand-coding using the formula
iEb = iE*b;
s2 = iEb - iE*(F*((iG+H*iE*F)\(H*iEb)))
@test s1 ≈ s2