Here we will show a general method to approach a constrained minimisation problem of a convex, differentiable function over a closed convex set . Such problems can be written in an unconstrained form as we discussed in the introduction
where is the indicator function associated with the set.
To get to the method, we will need three main parts:
discuss the first order condition (FOC) of the problem which will bring us to the notion of normal cone,
discuss how the normal cone relates to the Euclidean projection,
introduce the projected gradient descent algorithm.
The FOC indicates that is a minimiser if and only if . However, remember that the subdifferential of a sum contains the sum of subdifferentials (see convex analysis pt. 2) so that if then is a minimiser.
If is a subgradient of at a point then, by definition,
For any , this inequality is trivially verified since the left hand side is infinite. Letting and since so that , we are left with
This inequality defines the subdifferential but also happens to be the definition of the normal cone to the convex set at denoted . In short: Bringing the pieces together, we have
Of course this doesn't really help much because at this point we don't know how to find a such that the left-hand-side holds. This is what the next section will explore.
A useful (and obvious) property of the normal cone which we shall use in the sequel is that it is invariant under non-negative scalar multiplication, i.e.:
We will now show the connection between the normal cone and the Euclidean projection on . Remember that the Euclidean projection onto (denoted ) is defined as follows for any :
Note that in the unconstrained problem, and hence , the identity operator. Note also that the factor is here for aesthetics when computing the gradient which is then just .
Considering the condition (5) for the optimisation problem (7), we have that if
then is a minimiser with thus . This is equivalent to or . Replacing by and re-arranging terms gives
This is a classical and important relationship.
We indicated at (6) that for any and , then . We can rewrite this as:
which may seem pointless but will lead us to a fixed-point algorithm (remember from the introduction that many algorithms for convex optimisation can be expressed in terms of fixed point algorithms).
Let so that (11) can be rearranged to . By (10), this is equivalent to or
To finish up, let's go back once more to the FOC (5) which indicates that if is such that then it is a minimiser. Combined with (12), we get the following useful fixed-point form for the minimiser of the constrained problem:
Note: had we not assumed that was differentiable on , we would still have that there exists a subgradient with and consequently the fixed-point leading the projected subgradient method. In practice however, one tries to cast a convex objective function as a sum of a smooth function with a non-smooth one, to follow a gradient descent on the smooth part and use a projection for the non-smooth part (this is the case here with being the non-smooth part).
Well there's not much left to do. Applying a fixed-point iteration to (13) from a starting point generates the sequence via
where is the step size (ignoring the trivial case ). This is the projected gradient descent method.
Assuming that the are picked sensibly and basic regularity conditions on the problem are met, the method enjoys a convergence rate (see references for more).
Note: this method is pretty easy to apply provided you can compute the projection. Of course, there may be situations for which computing the projection is as hard as the initial problem! But there are many special cases where efficient projection can be applied (e.g. if is a polyhedron, i.e. corresponds to a set of such that for some matrix and vector ). See for example Liu and Ye (2009).
Burke, The Gradient Projection Algorithm, 2014. Lecture notes at the University of Washington covering the topic in a bit more depth.
Saunders, Notes on First-Order Methods for Minimizing Smooth Functions, 2017. – Lectures notes at Stanford covering the topic (among others) and proving the linear convergence rate under regularity conditions.
Liu, Ye, Efficient Euclidean Projections in Linear Time, 2009. – A paper describing the projection problem and how it can be done efficiently in some cases.
See also the general references mentioned in the introduction.