Matrix inversion lemmas

The Woodbury formula is maybe one of the most ubiquitous trick in basic linear algebra: it starts with the explicit formula for the inverse of a block 2x2 matrix and results in identities that can be used in kernel theory, the Kalman filter, to combine multivariate normals etc.

In these notes I present a simple development leading to the Woodbury formula, and the special case of the Sherman-Morrison formula with some code to show those at work.

Partitioned matrix inversion

Consider an invertible matrix MM made of blocks AA, BB, CC and DD with

M ⁣ ⁣= ⁣ ⁣(ABCD) M \quad\!\! =\quad\!\! \begin{pmatrix} A & B \\ C & D \end{pmatrix}

where both AA and DD are assumed to be square and invertible. The aim is to express the inverse of MM in terms of the blocks AA, BB, CC and DD and the inverse of AA and DD. We can write the inverse of MM using an identical structure:

M1 ⁣ ⁣= ⁣ ⁣(WXYZ), M^{-1} \quad\!\! =\quad\!\! \begin{pmatrix} W & X \\ Y& Z \end{pmatrix},

for some WW, XX, YY and ZZ to be determined. Using MM1=IMM^{-1} = \mathbf I since MM is assumed to be invertible, we get the following system of equations:

{AW+BY= ⁣ ⁣IAX+BZ= ⁣ ⁣0CW+DY= ⁣ ⁣0CX+DZ= ⁣ ⁣I \begin{cases} AW + BY &=\quad\!\! \mathbf I \\ AX + BZ &=\quad\!\! \mathbf 0 \\ CW + DY &=\quad\!\! \mathbf 0 \\ CX + DZ &=\quad\!\! \mathbf I \end{cases}

Left multiplying the first two equations by A1A^{-1} and the last two by D1D^{-1} and re-arranging a little gives

{(IA1BD1C)W= ⁣ ⁣A1(ID1CA1B)Z= ⁣ ⁣D1and{X= ⁣ ⁣A1BZY= ⁣ ⁣D1CW \begin{cases} (\mathbf I - A^{-1} B D^{-1} C)W &=\quad\!\! A^{-1}\\ (\mathbf I - D^{-1} CA^{-1} B)Z &=\quad\!\! D^{-1}\\ \end{cases} \quad\text{and}\quad \begin{cases} X &=\quad\!\! -A^{-1} B Z\\ Y &=\quad\!\! -D^{-1} C W \end{cases}

Let us assume that both (IA1BD1C)(\mathbf I - A^{-1} B D^{-1} C) and (ID1CA1B)(\mathbf I - D^{-1} CA^{-1} B) are invertible, then the first two equations can be further simplified using that (EF)1=F1E1(E F)^{-1} = F^{-1} E^{-1} for invertible matrices EE and FF:

{W= ⁣ ⁣(ABD1C)1= ⁣ ⁣(Ds)1Z= ⁣ ⁣(DCA1B)1= ⁣ ⁣(As)1 \begin{cases} W &=\quad\!\! (A - BD^{-1} C)^{-1} &=\quad\!\! (D^s)^{-1}\\ Z &=\quad\!\! (D - CA^{-1} B)^{-1} &=\quad\!\! (A^s)^{-1} \end{cases}

where the matrix Ds:=(ABD1C)D^s := (A-BD^{-1} C) (resp. As:=(DCA1B)A^s := (D-CA^{-1} B)) are called the Schur-Complement of DD (resp. AA). We will come back to these when revisiting the assumptions made in a subsequent point

We started with MM1=IMM^{-1} = \mathbf I but we could have started with M1M=IM^{-1} M=\mathbf I of course. This gives an equivalent result but the form obtained for YY and XX is different:

{Y= ⁣ ⁣ZCA1X= ⁣ ⁣WBD1 \begin{cases} Y &=\quad\!\! -ZCA^{-1}\\ X &=\quad\!\! -WBD^{-1} \end{cases}

Basic lemmas

Equating the expressions in (4) and (6) for YY gives D1CW=ZCA1D^{-1} CW = ZCA^{-1} which, combined with (5) gives the first lemma.

(Lemma I) let AA and DD be square, invertible matrices of size nA×nAn_A\times n_A and nD×nDn_D\times n_D and BB and CC be matrices of size nA×nDn_A\times n_D and nD×nAn_D\times n_A, the following identity holds:

D1C(ABD1C)1 ⁣ ⁣= ⁣ ⁣(DCA1B)1CA1. D^{-1} C (A-BD^{-1} C)^{-1} \quad\!\! =\quad\!\! (D - CA^{-1} B)^{-1} CA^{-1}.

Equating the expressions in (4) and (6) for XX gives WBD1=A1BZWBD^{-1} = A^{-1} B Z which, combined with (5) gives the second lemma.

(Lemma II) under the same assumptions as for Lemma I, the following identity holds:

(ABD1C)1BD1 ⁣ ⁣= ⁣ ⁣A1B(DCA1B)1. (A - BD^{-1} C)^{-1} BD^{-1} \quad\!\! =\quad\!\! A^{-1} B (D - CA^{-1} B)^{-1}.

Woodbury formula

One little bit of dark magic is required to get the Woodbury formula: observe that if we take the term (ABD1C)(A-BD^{-1} C) and right-multiply it by A1-A^{-1} we get

(ABD1C)(A1) ⁣ ⁣= ⁣ ⁣BD1CA1I (A-BD^{-1} C)(-A^{-1}) \quad\!\! =\quad\!\! \textcolor{green}{BD^{-1}} \textcolor{blue}{CA^{-1}} - \mathbf I

and therefore BD1CA1=(I+(ABD1C)(A1))BD^{-1} CA^{-1} = (\mathbf I + (A-BD^{-1} C)(-A^{-1})). Now if we post-multiply (8) by CA1CA^{-1} and re-arrange the expression, we get the third lemma.

(Lemma III) under the same assumptions as for Lemma I, the following identity holds:

(ABD1C)1 ⁣ ⁣= ⁣ ⁣A1+A1B(DCA1B)1CA1. (A-BD^{-1} C)^{-1} \quad\!\! =\quad\!\! A^{-1} + A^{-1} B(D-CA^{-1} B)^{-1} CA^{-1}.

of course the same gymnastics can be applied with the term (DCA1B)1(D-CA^{-1} B)^{-1} to obtain a similar identity.

To obtain the classical Woodbury formula though, we just need to reassign letters with EAE\leftarrow A, FBF\leftarrow -B, GD1G\leftarrow D^{-1} and HCH\leftarrow C. (So Lemma III is already the Woodbury formula, the re-assignment only leads to a somewhat more visually pleasing form)

(Woodbury formula) let EE, GG be square invertible matrices of dimensions nE×nEn_E \times n_E and nG×nGn_G\times n_G respectively, let FF and HH be matrices of size nE×nGn_E\times n_G and nG×nEn_G\times n_E respectively, then the following identity holds:

(E+FGH)1 ⁣ ⁣= ⁣ ⁣E1E1F(G1+HE1F)1HE1 (E+FGH)^{-1} \quad\!\! =\quad\!\! E^{-1} - E^{-1} F(G^{-1} + HE^{-1} F)^{-1} H E^{-1}

Sherman-Morrison formula

Consider again (11) and let G=1G=1, F=uF=u and H=vH=v with u,vRnEu, v\in\mathbb R^{n_E} then the formula gives

(E+uvT)1 ⁣ ⁣= ⁣ ⁣E1E1uvTE11+vTE1u, (E+uv^T)^{-1} \quad\!\! =\quad\!\! E^{-1} -{E^{-1} u v^T E^{-1}\over 1 + v^T E^{-1} u},

a useful expression for the inverse of a matrix combined with a rank-1 perturbation. This is used for instance in the development of the famous BFGS flavour of the Quasi-Newton iterations (see e.g. the wikipedia article).

Revisiting the assumptions

thanks to Christopher Yeh for the interesting discussion on this topic. Chris also wrote a post on this topic.

In the development above, we made a few assumptions to simplify the development, sometimes stronger than needed. We can now make those more precise without risking confusion.

Let us start with the same description of MM as in (1); we had introduced the Schur-Complement of AA as As:=(DCA1B)A^s := (D - CA^{-1} B), and that of DD as Ds:=(ABD1C)D^s := (A - BD^{-1} C).

With those definitions, we have the following set of properties:

(Theorem) Let MM be as in (1):

  • (1A) if both MM and AA are invertible then AsA^s is invertible,

  • (1B) if both MM and DD are invertible then DsD^s is invertible,

  • (2A) if AA and AsA^s are invertible then MM is invertible,

  • (2B) if DD and DsD^s are invertible then MM is invertible,

  • (3) if both AA and DD are invertible as well as one of {M,As,Ds}\{M, A^s, D^s\}, then they all are.

(1A - proof) take z=(x1,x2)z = (x_1, x_2) a vector of dimension compatible with MM and such that Asx2=0A^s x_2 = 0 so that Dx2=CA1Bx2Dx_2 = CA^{-1} B x_2. Then, considering MzMz gives:

Mz=(Ax1+Bx2,Cx1+CA1Bx2) Mz = (Ax_1 + Bx_2, Cx_1 + CA^{-1} B x_2)

since AA is invertible, we can form x1=A1Bx2x_1 = -A^{-1} B x_2 so that Mz=0Mz=0. Since MM is invertible, z=0z=0 and thus necessarily x2=0x_2=0 so that AsA^s is invertible. Proof of 1B is identical.

(2A - proof) let z=(x1,x2)z=(x_1, x_2) such that Mz=0Mz=0 then

Ax1+Bx2=0Cx1+Dx2=0\begin{array}{rcl} Ax_1 + Bx_2 &=& 0\\ Cx_1 + Dx_2 &=& 0\end{array}

since AA is invertible, we can write x1=A1Bx2x_1 = -A^{-1} Bx_2 and the second equation becomes

(DCA1B)x2 ⁣ ⁣= ⁣ ⁣0 (D - CA^{-1} B)x_2 \quad\!\! =\quad\!\! 0

or Asx2=0A^s x_2 = 0. But since AsA^s is invertible, x2=0x_2=0 and therefore x1=0x_1=0 so that zz is necessarily 00 and MM is invertible. Proof of 2B is identical.

(3 - proof) this is trivially implied by combining the previous properties.

Back to the development

In the development we were working under (3) with AA, DD and MM invertible (and therefore AsA^s and DsD^s as well). We had then made the assumption that (IA1BC)(\mathbf I-A^{-1} BC) and (ID1CA1)(\mathbf I - D^{-1} CA^{-1}) were invertible, but these are actually implied by the fact that respectively DsD^s and AsA^s are invertible. Indeed, taking the first one, if we take zz such that

(IA1BD1C)z ⁣ ⁣= ⁣ ⁣0 (\mathbf I-A^{-1} B D^{-1} C)z \quad\!\! =\quad\!\! 0

and left-multiply by AA, we have

(ABD1C)z ⁣ ⁣= ⁣ ⁣Dsz ⁣ ⁣= ⁣ ⁣0 (A - BD^{-1} C)z \quad\!\! =\quad\!\! D^sz \quad\!\! = \quad\!\! 0

but since DsD^s is invertible (by 3), z=0z=0 so that (IA1BD1C)(\mathbf I-A^{-1} BD^{-1} C) is invertible (the second case is identical).

A bit of code

If you want to see these equations at work, here's a simple Julia script:

using Test
using LinearAlgebra: dot

# Woodbury formula
n_E, n_G = 13, 15;
E = randn(n_E, n_E);
F = randn(n_E, n_G);
G = randn(n_G, n_G);
H = randn(n_G, n_E);
iE, iG = inv(E), inv(G);
@test inv(E+F*G*H) ≈ iE - iE*F*inv(iG + H*iE*F)*H*iE

# Sherman-Morrison formula
n_E = 23;
E = randn(n_E, n_E);
u = randn(n_E);
v = randn(n_E);
iE = inv(E)
iEu = iE*u
@test inv(E + u*v') ≈ iE - (iEu*(v'*iE))/(1+dot(v, iEu))

(Recall that invertible matrices are dense among square matrices so that using randomly generated matrices for EE and GG is unlikely to cause problems).

Tim Holy has written a simple package for this called WoodburyMatrices.jl.

using WoodburyMatrices
W = Woodbury(E, F, G, H);
b = randn(n_E);
# using the package
s1 = W\b;
# hand-coding using the formula
iEb = iE*b;
s2  = iEb - iE*(F*((iG+H*iE*F)\(H*iEb)))
@test s1 ≈ s2