The aim of the notes you'll find here is to suggest a direct, constructive path to results I find useful, beautiful, or simply interesting. This stems mainly from my experience reading papers where the presentation sometimes trades clarity for generality.
The emphasis is on intuition rather than rigour though I try to indicate where simplifications are made and what they may imply. Often, these simplifications allow for straightforward demonstrations. The references to more technical work can then be helpful to learn the results in full.
Target audience: advanced undergrads or grads in quantitative fields such as applied-maths, stats, comp-sci, etc, assuming a decent background in basic maths (in particular linear algebra, real analysis and probability theory). When the level of the notes is judged (somewhat arbitrarily) to be a bit more advanced, a "⭒" symbol is prepended.
Errors & Comments: if you find anything dubious in the notes, please send me an email, feedback is always much appreciated.
approximate Bayesian inference (assumes knowledge of the Bayesian framework, familiarity with the exponential family and convex optimisation; these notes are mostly adapted from a section of my PhD thesis)
kernel methods (assumes good knowledge of stats and real analysis)
convex optimisation (assumes familiarity with convexity):
introduction: introduction of the general minimisation problem and hint at generic iterative methods.
convex analysis part 1: the subdifferential and the first-order optimality condition.
(⭒) convex analysis part 2: the convex conjugate along with some useful properties such as Fenchel's inequality and the Fenchel-Moreau theorem.
(⭒) convex analysis part 3: strict and strong convexity, the Bregman divergence and link between lipschitz continuity and strong convexity.
projected gradient descent: normal cone, Euclidean projection and projected gradient descent.
mirror descent algorithm: generalised projected gradient descent and the mirror descent algorithm.
thoughts on first order methods: first order methods, minimising sequence, admissible direction, generalised projected gradient descent (again).
(⭒) splitting methods and ADMM: splitting methods in optimisation, proximal methods and ADMM.
matrix inversion lemmas: re-obtaining the Woodbury formula and the Sherman-Morrison formula (with some code).