# Projected gradient descent

Here we will show a general method to approach a constrained minimisation problem of a convex, differentiable function $f$ over a closed convex set $C\subset \mathbb R^n$. Such problems can be written in an unconstrained form as we discussed in the introduction

$\begin{array}{rcl} \min_x\quad\!\! f(x) + i_C(x),\end{array}$where $i_C$ is the indicator function associated with the set.

To get to the method, we will need three main parts:

discuss the first order condition (FOC) of the problem which will bring us to the notion of

*normal cone*,discuss how the

*normal cone*relates to the*Euclidean projection*,introduce the

*projected gradient descent*algorithm.

## First order condition

The FOC indicates that $x^\dagger \in C$ is a minimiser if and only if $0\in \partial(f + i_C)(x^\dagger)$. However, remember that the subdifferential of a sum contains the sum of subdifferentials (see convex analysis pt. 2) so that if $0 \in \nabla f(x^\dagger) + \partial i_C(x^\dagger)$ then $x^\dagger$ is a minimiser.

If $y$ is a subgradient of $i_C$ at a point $x\in C$ then, by definition,

$\begin{array}{rcl} i_C(z) &\ge& i_C(x) + \left\langle z-x, y\right\rangle, \quad\forall z.\end{array}$For any $z\notin C$, this inequality is trivially verified since the left hand side is infinite. Letting $z\in C$ and since $x\in C$ so that $i_C(x)=i_C(z)=0$, we are left with

$\begin{array}{rcl} 0 &\ge& \left\langle z-x, y\right\rangle, \quad\forall z\in C.\end{array}$This inequality defines the subdifferential $\partial i_C$ but also happens to be the definition of the **normal cone** to the convex set $C$ at $x$ denoted $N_C(x)$. In short: $\begin{array}{rcl} \partial i_C(x) &=& \{y\in \mathbb R^n \,\mid\, \left\langle z-x, y\right\rangle\le 0, \forall z\in C\}\quad\!\! =\quad\!\! N_C(x).\end{array}$ Bringing the pieces together, we have

Of course this doesn't really help much because at this point we don't know how to find a $x^\dagger$ such that the left-hand-side holds. This is what the next section will explore.

## Normal cone and Euclidean projection

A useful (and obvious) property of the normal cone which we shall use in the sequel is that it is invariant under non-negative scalar multiplication, i.e.:

$\begin{array}{rcl} \mathrm{if}\,\, u\in N_C(x)\,\,\mathrm{then}\,\, \alpha u\in N_C(x), \,\forall \alpha \ge 0. \end{array}$We will now show the connection between the normal cone and the Euclidean projection on $C$. Remember that the Euclidean projection onto $C$ (denoted $\pi_C$) is defined as follows for any $z\in \mathbb R^n$:

$\pi_C(z) \quad\!\! =\quad\!\! \arg\min_{x\in C}\,\, \frac12\|x-z\|_2^2$Note that in the *unconstrained problem*, $C=\mathbb R^n$ and hence $\pi_C=\mathbb I$, the identity operator. Note also that the factor $1/2$ is here for aesthetics when computing the gradient which is then just $(x-z)$.

Considering the condition (5) for the optimisation problem (7), we have that if

$\begin{array}{rcl} 0\,\in\, (x^\dagger-z) + N_C(x^\dagger)\end{array}$then $x^\dagger$ is a minimiser with thus $x^\dagger = \pi_C(z)$. This is equivalent to $z \in x^\dagger + N_C(x^\dagger)$ or $z\in (\mathbb I + N_C)(x^\dagger)$. Replacing $x^\dagger$ by $\pi_C(z)$ and re-arranging terms gives

$\begin{array}{rcl} \pi_C(z) &=& (\mathbb I + N_C)^{-1}(z).\end{array}$This is a classical and important relationship.

## Projected gradient descent

We indicated at (6) that for any $\alpha \ge 0$ and $u\in N_C(x)$, then $\alpha u\in N_C(x)$. We can rewrite this as:

$\begin{array}{rcl} (x+\alpha u) - x &\in& N_C(x), \end{array}$which may seem pointless but will lead us to a fixed-point algorithm (remember from the introduction that many algorithms for convex optimisation can be expressed in terms of fixed point algorithms).

Let $z:=(x+\alpha u)$ so that (11) can be rearranged to $z\in (\mathbb I+N_C)(x)$. By (10), this is equivalent to $x=\pi_C(z)$ or

$\begin{array}{rcl} x &=& \pi_C(x+\alpha u), \quad (\alpha \ge 0, \,u\in N_C(x)). \end{array}$To finish up, let's go back once more to the FOC (5) which indicates that if $x^\dagger$ is such that $-\nabla f(x^\dagger) \in N_C(x^\dagger)$ then it is a minimiser. Combined with (12), we get the following useful fixed-point form for the minimiser of the constrained problem:

**Note**: had we not assumed that $f$ was differentiable on $C$, we would still have that there exists a subgradient $f'(x^\dagger) \in \partial f(x^\dagger)$ with $-f'(x^\dagger)\in N_C(x^\dagger)$ and consequently the fixed-point $x^\dagger = \pi_C(x^\dagger - \alpha f'(x^\dagger))$ leading the *projected subgradient method*. In practice however, one tries to cast a convex objective function as a sum of a smooth function with a non-smooth one, to follow a gradient descent on the smooth part and use a projection for the non-smooth part (this is the case here with $i_C$ being the non-smooth part).

### The algorithm

Well there's not much left to do. Applying a fixed-point iteration to (13) from a starting point $x_0$ generates the sequence $\{x_0, x_1, \dots\}$ via

$\begin{array}{rcl} x_{k+1} &=& \pi_C(x_k - \alpha_k \nabla f(x_k))\end{array}$where $\alpha_k > 0$ is the *step size* (ignoring the trivial case $\alpha_k=0$). This is the *projected gradient descent* method.

Assuming that the $\alpha_k$ are picked sensibly and basic regularity conditions on the problem are met, the method enjoys a convergence rate $(f(x_k)-f(x^\dagger)) = \mathcal O(k^{-1})$ (see references for more).

**Note**: this method is pretty easy to apply *provided you can compute the projection*. Of course, there may be situations for which computing the projection is as hard as the initial problem! But there are many special cases where efficient projection can be applied (e.g. if $C$ is a polyhedron, i.e. corresponds to a set of $x$ such that $Ax\le b$ for some matrix $A$ and vector $b$). See for example Liu and Ye (2009).

## Additional references

**Burke**, The Gradient Projection Algorithm, 2014. Lecture notes at the University of Washington covering the topic in a bit more depth.**Saunders**, Notes on First-Order Methods for Minimizing Smooth Functions, 2017. – Lectures notes at Stanford covering the topic (among others) and proving the linear convergence rate under regularity conditions.**Liu**,**Ye**, Efficient Euclidean Projections in Linear Time, 2009. – A paper describing the projection problem and how it can be done efficiently in some cases.

*See also the general references mentioned in the introduction.*