## Convexity and Jensen's Inequality

Intuitively, a convex function is one that is upward-curving. For example, $f(x)=x^2$ is a convex function. There are generalizations to cover functions of more than one variable, but for now, I'm just going to talk about functions that take one real number as input and have one real number as output, i.e. $f: X \rightarrow {\Bbb R}$ where $X$ is an interval of the real line.

The first derivative of a convex function is non-decreasing, which means the second derivative is non-negative. This is clearly the case for our example $f(x)=x^2$, since $f'(x)=2x$ and $f''(x)=2$.

Now, if you choose two values of $x$, $x_1$ and $x_2$, and draw the line connecting the points $(x_1, f(x_1))$ and $(x_2, f(x_2))$ (the secant line), then this line will not go below the graph of $f(x)$ in between $x=x_1$ and $x=x_2$. As an example, let's choose $f(x)=x^2$, $x_1=1$, and $x_2=3$. Then $f(1)=1$ and $f(3)=9$, so the slope of the secant line is $\frac{9-1}{3-1}=\frac{8}{2}=4$, and the equation of the line in point-slope form is $y-9=4(x-3)$, or in $y=mx+b$ form, $y=4x-3$. If we graph this, we see that the secant line is indeed above the graph of $f(x)$ between $x=1$ and $x=3$:
We can also write the equation of the secant line using a parameter $t$; for example, if I want to linearly interpolate to find the point on the line a quarter of the way between 1 and 3 (i.e. $t=\frac{1}{4}$), it would be at $$\begin{multline} \shoveleft{x = 1+\frac{1}{4} \cdot (3-1) = 1.5} \\ \shoveleft{y = f(1) + \frac{1}{4} \cdot (f(3) - f(1)) = 1+\frac{1}{4} \cdot (9-1) = 3} \end{multline}$$ Thus, the secant line is the set of points $$\begin{multline} \shoveleft{x(t) = x_1 + t (x_2 - x_1)} \\ \shoveleft{y(t) = f(x_1) + t(f(x_2)-f(x_1))} \\ \shoveleft{t \in {\Bbb R}} \end{multline}$$ with the part of the line between $x_1$ and $x_2$ corresponding to $0 \leq t \leq 1$.

Combining like terms and replacing $t$ with $(1-t)$ (which is fine since $t$ is an arbitrary parameter; furthermore $0 \leq 1-t \leq 1 \iff 0 \leq t \leq 1$ ), we can write the secant line as the set of points $$\begin{multline} \shoveleft{x(t) = tx_1 + (1-t) x_2} \\ \shoveleft{y(t) = tf(x_1) + (1-t)f(x_2)} \\ \shoveleft{t \in {\Bbb R}} \end{multline}$$ The fact that the secant line lies above the graph of the function is the essence of the formal definition:

Definition of a convex function: Let $X$ be an interval of real numbers, i.e. $X = \{ x \in {\Bbb R} \ | \ -\infty \leq a < x < b \leq +\infty \}$ and $f$ be a function $f: X \rightarrow {\Bbb R}$. $f$ is called convex if, for any $x_1, x_2 \in X$ and $0 \leq t \leq 1$, $$f(tx_1 + (1-t)x_2) \leq tf(x_1) + (1-t)f(x_2) \\$$ $\square$

Based on the discussion above, this just says that a convex function is one for which the secant line lies above the graph of the function between any two input values. This is equivalent to saying that the area in the $xy$-plane above the graph of $f$, called the epigraph of $f$, is a convex set, i.e. contains the line segment connecting any two points in the set.

These are also equivalent to Jensen's inequality, which I'll introduce and prove in the theorem below. The following definitions and lemma will formalize these intuitive notions and help us state the theorem.

Definition of epigraph: For a function $f: X \rightarrow {\Bbb R}$ where $X$ is an interval of real numbers, the epigraph of $f$ is the set $\text{epi}(f) = \{ (x,z) \in X \times {\Bbb R} \ | \ z \geq f(x) \}$.
$\square$

Definition of a convex set (in the plane): A set $C$ of points in the $xy$-plane is convex if, for any points $(x_1, y_1), (x_2, y_2) \in C$, and for any value of $t \in [0,1]$, the point $(tx_1 + (1-t)x_2, ty_1 + (1-t)y_2) \in C$ as well.
$\square$

Definition of convex combination/hull: For points $(x_1, y_1), (x_2, y_2), \dotsc, (x_n, y_n)$, a convex combination (also called an affine combination) is a point $(\sum_{i=1}^{n}{\lambda_i x_i} , \sum_{i=1}^{n}{\lambda_i y_i})$ where $\sum_{i=1}^{n}{\lambda_i}=1$. The convex hull of the $n$ points is the set of all convex combinations of the points.
$\square$

The convex hull is the smallest set containing the convex combinations of the points and is a polygon with some or all of the points as vertices. Note that this polygon is trivially a convex set.

 The convex hull of the points $P_0, P_1, \dotsc, P_5$. Note that $P_2$ does not contribute to the convex hull since it is a convex combination of the other 5 points.

Lemma: Let $C$ be a convex set, and let ${\bf v}_1 = (x_1, y_1), {\bf v}_2 = (x_2, y_2), \dotsc, {\bf v}_n = (x_n, y_n)$ be points in $C$. Then the convex hull of the ${\bf v}_i$'s is contained in $C$.

Proof: The proof is by induction. For the base case, note that the convex hull of any 2 points is the line segment connecting them, which is contained in $C$ by the definition of a convex set.

Now assume that the statement is true for convex hulls of $n-1$ points. Let $\psi = \sum_{i=1}^{n}{\lambda_i {\bf v}_i}$ be a convex combination of ${\bf v}_1, {\bf v}_2, \dotsc, {\bf v}_n$. If $\lambda_n = 1$, then the other $\lambda_i$'s are all zero (since their sum must be 1), and we only have a single point. In that case, there is nothing to prove, so assume $\lambda_n \neq 1$, and thus $1-\lambda_n \neq 0$. Then we can define the point $$\phi = \sum_{i=1}^{n-1}{\frac{\lambda_i}{1-\lambda_n} {\bf v}_i}$$ and rewrite $\psi$ as $$\psi = (1-\lambda_n) \phi + \lambda_n {\bf v}_n \tag{\star}$$ Since $\sum_{i=1}^{n-1}{\lambda_i} = 1-\lambda_n$, we have $\sum_{i=1}^{n-1}{\frac{\lambda_i}{1-\lambda_n}}=1$, from which it is clear that $\phi$ is a convex combination of ${\bf v}_1, {\bf v}_2, \dotsc, {\bf v}_{n-1}$, and so $\phi \in C$ by the induction hypothesis.

Finally, since ${\bf v}_n$ is also in $C$, equation $( \star )$ shows that the point $\psi$ is on the line segment connecting two points in $C$, and thus $\psi \in C$ since $C$ is a convex set.
$\square$

Using the lemma, we can prove the following theorem without breaking a sweat.

Theorem: For a function $f: X \rightarrow {\Bbb R}$ where $X$ is an interval of real numbers, the following 3 statements are equivalent:
(a) $\text{epi}(f)$ is a convex set.
(b) (Jensen's inequality) For any $\lambda_1, \lambda_2, \dotsc , \lambda_n \geq 0$ with $\sum_{i=1}^{n}{\lambda_i}=1$ and values $x_1, x_2, \dotsc , x_n \in X$, $$f \left( \sum_{i=1}^{n}{\lambda_i x_i} \right) \leq \sum_{i=1}^{n}{\lambda_i f(x_i)}$$ (c) $f$ is a convex function (as defined above).

Proof: In order to prove that the 3 statements are equivalent, we need to prove that any one implies any of the others. We'll show that (a) $\implies$ (b) $\implies$ (c) $\implies$ (a).

(a) $\implies$ (b):
Assume $\text{epi}(f)$ is a convex set. Note that the points $(x_i, f(x_i)) \in \text{epi}(f)$. By the lemma, the convex combination $$\sum_{i=1}^{n}{\lambda_i (x_i, f(x_i))} = \left( \sum_{i=1}^{n}{\lambda_i x_i}, \sum_{i=1}^{n}{\lambda_i f(x_i)} \right)$$ is also in $\text{epi}(f)$. By the definition of $\text{epi}(f)$, this means that $$\sum_{i=1}^{n}{\lambda_i f(x_i)} \geq f \left( \sum_{i=1}^{n}{\lambda_i x_i} \right)$$ (b) $\implies$ (c):
Take $n=2$ and $\lambda_1 = t, \lambda_2 = 1-t$ in Jensen's inequality.

(c) $\implies$ (a):
Let ${\bf v}_1 = (x_1, z_1), {\bf v}_2 = (x_2, z_2) \in \text{epi}(f)$ and $t \in [0,1]$. We need to prove that the point $\psi = t{\bf v}_1 + (1-t){\bf v}_2 = (tx_1 + (1-t)x_2, tz_1 + (1-t)z_2)$ in in $\text{epi}(f)$.

Since ${\bf v}_1, {\bf v}_2 \in \text{epi}(f)$, we know that $z_1 \geq f(x_1)$ and $z_2 \geq f(x_2)$. Since $t, 1-t \geq 0$, we can multiply these inequalities by $t$ and $1-t$ respectively and then add them together to obtain: $$tf(x_1) + (1-t)f(x_2) \leq tz_1 + (1-t)z_2$$ Furthermore, since $f$ is a convex function, we have $$f(tx_1 + (1-t)x_2) \leq tf(x_1) + (1-t)f(x_2)$$ Combining these two inequalities yields $$f(tx_1 + (1-t)x_2) \leq tz_1 + (1-t)z_2$$ In other words, $\psi \in \text{epi}(f)$. This completes the proof.
$\square$

In addition to establishing Jensen's inequality (which will be used in the next post), the theorem above nicely connects the notion of a convex set with that of a convex function: convex functions are precisely those whose epigraphs are convex sets.

Before wrapping up this post, I want to point out two more statements equivalent to convexity if the function $f$ is (twice) differentiable. For fear of making a relatively simple post too long and proof-heavy, I'll omit the proofs, but like the facts above (other than perhaps Jensen's inequality), they are clear from a picture and familiar from calculus.

First, recall that if we draw the tangent line to the graph of $f(x)$ at the point $x=a$, we obtain a linear approximation of the function $f$, namely $L_{a}(x)=f(a) + f'(a)(x-a)$. Unless $f$ itself is linear, the approximation will have an error $E(x) = f(x) - L_{a}(x)$:

For a convex function, $E(x) \geq 0$. In other words, the tangent line at any point lies on or below the function itself.

Finally, we can see from the picture that as $x$ increases, the tangent line becomes steeper, i.e. $f'(x)$ is increasing (technically, non-decreasing). This means that $f''(x) \geq 0$. This is also equivalent to the definition of convexity via secant lines as long as the relevant derivatives are defined.

I hope this post was informative, and, as usual, I welcome all feedback in the comments section.