gtMath

Home » Archives for November 2016

Functions as Vectors (Part 1)

Posted by gtmath Tuesday, November 15, 2016 0 comments

Preliminaries:

Euclidean Space and Vectors: defines vector spaces and the dot product
How close is "close enough"?: defines metrics, metric spaces, and convergence
Sets of Functions: basic notation for function spaces
Convergence of Sequences of Functions: topology and metrizability of function spaces

The previous post covered convergence in function spaces, and we saw that different types of convergence correspond to different topologies on the underlying space.

In this post, I will add a few more tools (namely norms and inner products) to the vector space/linear algebra toolbox and show you how we introduce a vector space structure to function spaces. I will focus on the sequence spaces $\ell^p$ as illustrative examples, concluding with a proof of Hölder's inequality for sums.

In Part 2 of this post, I will introduce the notion of the dual space and prove an important result about the $\ell^p$ spaces to complete a recent reader request.

Engineers and physicists in the audience will recall that the function space formalism provides the mathematical foundation for Fourier analysis and quantum mechanics. It also comes into play in the modeling of randomness, on which I am planning some upcoming posts (stay tuned).

Vector space review

In "Euclidean space", I introduced vectors in ${\Bbb R}^n$ as ordered $n$-tuples, which, for $n=2$ and $n=3$, can be thought of as arrows with a length and direction. We saw that we can define addition of vectors as addition of the respective components, i.e. we apply the usual addition of real numbers to each component of the two vectors. We can also "scale" a vector by a number (called a scalar in this context) by multiplying each component by that number, once again in the usual real number sense.

In symbols (using $n=3$ for now), if ${\bf x} = (x_1 , x_2, x_3)$ and ${\bf y} = (y_1 , y_2 , y_3)$ are vectors, and $c$ is a scalar: $$
\begin{align}
{\bf x} + {\bf y} & \buildrel {\rm def} \over{=} (x_1 + y_1, x_2 + y_2, x_3 + y_3) \\[2mm]
c \, {\bf x} & \buildrel {\rm def} \over{=} (cx_1, cx_2, cx_3)
\end{align}
$$ ${\Bbb R}^n$ with addition and scalar multiplication defined in this way satisfies the vector space axioms (refer to the preliminary post).

If we have some set other than ${\Bbb R}^n$ and define a way to add its elements together and multiply them by a scalar (usually from the real numbers $\Bbb R$ or the complex numbers $\Bbb C$, but technically from any field), and if these definitions satisfy the vector space axioms, then the set endowed with these operations is called a vector space, and its elements are called vectors. Thus, the term vector encompasses a more general class of objects than the motivating example of arrows with 2 or 3 components.

The study of vector spaces is called linear algebra and is one of the best understood areas of mathematics. Verifying that a set endowed with a definition of addition and scalar multiplication is a vector space immediately implies that all the theorems proven about vector spaces (of which there are many) apply. As you may have already surmised, in this post, we'll be looking at vector spaces in which the vectors are functions, i.e. function spaces.

Inner products, norms, and the distance formula

Recall also from the Euclidean space post that we defined the dot product of two vectors in ${\Bbb R}^n$ as ${\bf x} \cdot {\bf y} = \sum_{i=1}^{n}{x_i y_i}$. The dot product satisfies 3 axioms which make it a so-called inner product (in fact, the dot product inspired the definition of inner products):

Symmetry*:
$\ \ \ \ \ \ \ \ {\bf x} \cdot {\bf y} = {\bf y} \cdot {\bf x}$
Linearity in the first argument:
$\ \ \ \ \ \ \ \ (c \, {\bf x}) \cdot {\bf y} = c \, ({\bf y} \cdot {\bf x})$
$\ \ \ \ \ \ \ \ ({\bf x} + {\bf z}) \cdot {\bf y} = ({\bf x} \cdot {\bf y}) + ({\bf z} \cdot {\bf y})$
Positive-definiteness:
$\ \ \ \ \ \ \ \ {\bf x} \cdot {\bf x} \geq 0$
$\ \ \ \ \ \ \ \ {\bf x} \cdot {\bf x} = 0 \iff {\bf x} = {\bf 0}$

*Note: when dealing with vector spaces over the field of complex numbers instead of real numbers, the symmetry property is replaced with conjugate symmetry: ${\bf x} \cdot {\bf y} = \overline{{\bf y} \cdot {\bf x}}$, where the bar over the right side is complex conjugation: $\overline{a + bi} := a - bi$. We won't worry about complex vector spaces in this post.

That the dot product satisfies these properties is very easy to check. For example: $$
\begin{align}
({\bf x} + {\bf z}) \cdot {\bf y}
&= \sum_{i=1}^{n}{(x_i + z_i) y_i} \\
&= \sum_{i=1}^{n}{(x_i y_i + z_i y_i)} \\
&= \sum_{i=1}^{n}{x_i y_i} + \sum_{i=1}^{n}{z_i y_i} \\
&= ({\bf x} \cdot {\bf y}) + ({\bf z} \cdot {\bf y})
\end{align}
$$
A vector space with an inner product is called an inner product space. Inner products are often denoted $\langle {\bf x} , {\bf y} \rangle$, and I will use this notation for the remainder of this post.

If we have an inner product, we automatically get a way to specify the "size" or magnitude of a vector ${\bf x}$ by the definition $\| {\bf x} \| \buildrel \rm{def} \over{=} \sqrt{\langle {\bf x}, {\bf x} \rangle}$. This measure of magnitude satisfies 3 properties, as a direct consequence of the properties an inner product must satisfy, which make it a so-called norm:

Positive-definiteness:
$\ \ \ \ \ \ \ \ \| {\bf x} \| \geq 0$
$\ \ \ \ \ \ \ \ \|{\bf x} \| = 0 \iff {\bf x} = {\bf 0}$
Scaling:
$\ \ \ \ \ \ \ \ \| c\, {\bf x} \| = |c| \| {\bf x} \|$
Triangle inequality:
$\ \ \ \ \ \ \ \ \| {\bf x} + {\bf y}\| \leq \| {\bf x} \| + \| {\bf y} \|$

A vector space with a norm is called a normed vector space. The norm also gives us a way to measure the distance between two vectors by the definition $$
d({\bf x}, {\bf y}) \buildrel \rm{def} \over{=} \| {\bf y} - {\bf x} \|
$$ which, by the way, is automatically a metric due to the properties of norms. In the inner product space ${\Bbb R}^3$ (with the dot product as the inner product), this formula yields $$
d({\bf x}, {\bf y}) = \sqrt{(y_1 - x_1)^2 + (y_2 - x_2)^2 + (y_3 - x_3)^2}
$$ which is the well-known (Euclidean) distance formula.

The $\ell^2$ space

Let's start by considering a vector space that is a natural generalization of the familiar ${\Bbb R}^n$, the set of all $n$-dimensional "arrows", i.e. ordered $n$-tuples of real numbers ${\bf x} = (x_1, x_2, x_3, \dotsc , x_n)$. We made this set into a vector space by defining vector addition and scalar multiplication as the obvious component-wise operations. We'll now look at an infinite-dimensional analog of this space.

Consider the set of infinite sequences ${\bf x} = (x_1, x_2, x_3, \dotsc)$ of real numbers which are square-summable, i.e. $\sum_{i=1}^{\infty}{x_i^2} < \infty$. We'll see the reason for this restriction in a moment. This is a subset of the set of functions ${\Bbb R}^{\Bbb N}$ which we will call $\ell^2$ (I'll explain this notation later in the post). We can give $\ell^2$ a vector space structure using the following definitions, completely analogous to the ${\Bbb R}^n$ ones:

The zero vector is the sequence of all zeros:

${\bf 0} = (0,0,0, \dotsc)$

Vector addition is defined component-wise:

$(x_1, x_2, x_3, \dotsc) + (y_1, y_2, y_3, \dotsc) \buildrel \rm{def} \over{=} (x_1 + y_1, x_2 + y_2, x_3 + y_3, \dotsc)$

Scalar multiplication is defined similarly:

$c \, (x_1, x_2, x_3, \dotsc) \buildrel \rm{def} \over{=} (cx_1, cx_2, cx_3, \dotsc)$

It is routine to show that this set with these definitions satisfies the vector space axioms, and thus we can refer to these sequences as "vectors".

Furthermore, since we are only considering sequences that are square-summable, thus excluding ones like $(1,1,1, \dotsc )$, we can define the $\ell^2$ norm of a vector/sequence to be $$
\|{\bf x}\|_{2} \buildrel \rm{def} \over{=} \sqrt{\sum_{i=1}^{\infty}{x_i^2}}
$$ and know that this infinite sum converges to a finite value. Once again, this definition is very similar to the formula in ${\Bbb R}^n$. In the latter case, the norm was induced by the inner product in the sense that $\| {\bf x} \| = \sqrt{\langle {\bf x},{\bf x}\rangle}$. This is also the case for the norm in our space of sequences if we define the inner product using the formula $$
\langle {\bf x},{\bf y} \rangle \buildrel \rm{def} \over{=} \sum_{i=1}^{\infty}{x_i y_i}
$$ It is obvious that the above formula indeed defines an inner product, but there is one potential issue with this definition: we don't know a priori that the infinite sum on the right-hand side actually converges for any two square-summable sequences ${\bf x}$ and ${\bf y}$. That it does is a consequence of a version of the Cauchy-Schwarz inequality for infinite sums; at the end of the post, I will prove the more general Hölder's inequality, so for now, take my word for it that the series does converge, and thus that this inner product is well defined.

The $\ell^p$ spaces

The more general analogs of the $\ell^2$ space are the $\ell^p$ spaces, consisting of all sequences ${\bf x} = (x_1, x_2, x_3, \dotsc )$ for which $\sum_{i=1}^{\infty}{|x_i|^p} < \infty$. The larger we make $p$, the easier it is for the series to converge, as the sequence values less than 1 "drop off" more quickly. The $p$-series for ${\bf x} = (1, \tfrac{1}{2}, \tfrac{1}{3}, \tfrac{1}{4}, \dotsc )$ illustrate this concept perfectly (you may remember these from your Calc II class):

As a result, $\ell^p \subset \ell^s$ for $p<s$; put differently, $\ell^s$ contains more sequences than $\ell^p$, since the larger exponent $s$ makes it easier for the series $\sum_{i}{|x_i|^s}$ to converge.

We can define the $\ell^p$ norms by the following formula for $1 \leq p < \infty$: $$
\| {\bf x} \|_{p} \buildrel{\rm def} \over{=} \left( \sum_{i=1}^{\infty}{|x_i|^p} \right)^{1/p}
$$ When $0<p<1$, this formula fails to define a norm, because it doesn't satisfy the triangle inequality: as a counterexample, take ${\bf x} = (1,0,0,0, \dotsc )$ and ${\bf y} = (0,1,0,0, \dotsc )$. Then $$
\| {\bf x} + {\bf y} \|_{p} = \| (1,1,0,0, \dotsc ) \|_{p} = 2^{1/p} > 2 = 1 + 1 = \| {\bf x} \|_{p} + \| {\bf y} \|_{p}
$$ so $\| {\bf x} + {\bf y} \|_{p} \nleq \| {\bf x} \|_{p} + \| {\bf y} \|_{p}$ in this case. Geometrically, the counterexample illustrates that when $0 < p < 1$, the unit ball (i.e. the open ball of radius 1 centered at ${\bf 0}$) is not convex: it contains points which we can connect with a straight line, and that straight line will contain points outside the unit ball:

Unit balls in the $\ell^p$ norms (on ${\Bbb R}^2$) for various values of $p$ - from Wikipedia

For this reason, we will restrict our attention to $p \geq 1$ so that the formula above actually defines a norm (in the interest of brevity, I'll omit the proof of this fact). Finally, the diagram above shows a unit ball for $p=\infty$, which needs to be treated specially. The $\ell^{\infty}$ space is just the set of bounded sequences with the norm $$
\| {\bf x} \|_{\infty} \buildrel{\rm def} \over{=} \sup_{i}{|x_i|}
$$ where this supremum (a synonym for "least upper bound") is guaranteed to exist since the sequences in the space are bounded by definition. Thus in the diagram, we see that the $\ell^{\infty}$ norm in the plane has a unit ball consisting of points for which the coordinate values have a least upper bound of 1. In other words, in two dimensions, the $\ell^{\infty}$ unit ball consists of points whose $x$- and $y$-coordinates are both less than or equal to 1, i.e. a square.

Note: We've actually already seen the metric induced by the $\ell^{\infty}$ norm, $\| {\bf x} - {\bf y}\|_{\infty}$, as the uniform distance. Thus convergence of a sequence of functions in the $\ell^{\infty}$ norm is the same as uniform convergence.

Now, you were probably wondering whether, like the $\ell^2$ norm, the other $\ell^p$ norms arise from inner products. Unfortunately, they do not.

Proposition: The $\ell^p$ norm is induced by an inner product if and only if $p=2$.

Proof: We already showed that the $\ell^2$ norm does arise from an inner product. To prove that $p=2$ is the only value for which this is true, note that if $\| \cdot \|$ is any norm arising from an inner product, then for any vectors ${\bf x}, {\bf y}$, we have $$
\begin{align}
\| {\bf x} + {\bf y} \|^2 &= \langle {\bf x} + {\bf y}, {\bf x} + {\bf y} \rangle
= \langle {\bf x}, {\bf x} \rangle
+ \langle {\bf x}, {\bf y} \rangle
+ \langle {\bf y}, {\bf x} \rangle
+\langle {\bf y}, {\bf y} \rangle \\

\| {\bf x} - {\bf y} \|^2 &= \langle {\bf x} - {\bf y}, {\bf x} - {\bf y} \rangle
= \langle {\bf x}, {\bf x} \rangle
- \langle {\bf x}, {\bf y} \rangle
- \langle {\bf y}, {\bf x} \rangle
+\langle {\bf y}, {\bf y} \rangle
\end{align}
$$ Thus, $$
\| {\bf x} + {\bf y} \|^2 + \| {\bf x} - {\bf y} \|^2
= 2 \langle {\bf x}, {\bf x} \rangle
+ 2 \langle {\bf y}, {\bf y} \rangle
= 2 \| {\bf x} \|^2 + 2 \| {\bf y} \|^2
$$ This is known as the parallelogram law, a generalization of the Pythagorean theorem for right triangles.

Vectors involved in the parallelogram law - from Wikipedia

Applying this to the $\ell^p$ norm with ${\bf x} = (1,0,0,0, \dotsc )$ and ${\bf y} = (0,1,0,0, \dotsc )$, we obtain $$
\begin{align}
&&2^{2/p} + 2^{2/p} &= 2 \cdot 1^{2/p} + 2 \cdot 1^{2/p} \\
&\implies &2 \cdot 2^{2/p} &= 2+2 \\
&\implies &2^{(2/p+1)} &= 4 = 2^2 \\
&\implies &2/p+1 &=2 \\
&\implies &p &=2
\end{align}
$$ $\square$

So $\ell^2$ is the only inner product space of the $\ell^p$ family.

The $L^p$ spaces

The $\ell^p$ spaces are all subsets of ${\Bbb R}^{\Bbb N}$, the space of real-valued sequences. Naturally, these spaces have uncountably infinite analogs which are subsets of ${\Bbb R}^{\Bbb R}$, the space of real-valued functions taking inputs along the entire number line (instead of just $1,2,3,\dotsc$).

For $1 \leq p < \infty$, the $L^p$ space is defined as the set of functions $f: {\Bbb R} \rightarrow {\Bbb R}$ for which $$
\int_{-\infty}^{\infty}{|f(x)|^p \, dx} < \infty
$$ with the norm $$
\| f \|_{p} \buildrel{\rm def} \over{=} \left( \int_{-\infty}^{\infty}{|f(x)|^p \, dx} \right)^{1/p}
$$ The $L^{\infty}$ space is also defined analogously to $\ell^{\infty}$, but with the supremum replaced by the essential supremum (see below). Finally, like the discrete case, the only $L^p$ norm which is induced by an inner product is the $L^2$ norm, with the inner product $$
\langle f,g \rangle \buildrel{\rm def} \over{=} \int_{-\infty}^{\infty}{f(x)g(x) \, dx}
$$ So basically, the $L^p$ spaces are the same as the $\ell^p$ spaces, but with sequences replaced by functions of the entire number line and, accordingly, sums replaced by integrals. However, there are a number of complications which arise when we move from discrete to continuous inputs, namely:

The integral is understood to be a Lebesgue integral instead of the usual Riemann integral from calculus. The two agree whenever they are both defined, but the Lebesgue integral is defined for many more functions than the Riemann integral. A rigorous definition requires measure theory, which tells us how to define the measure of a given set of input values. The Lebesgue measure on the real number line is designed such that the measure of an interval $(a,b)$ is $b-a$.
The integral is not affected by changes in the function value on a set of measure zero. Any finite set of points has Lebesgue measure zero. Furthermore, any countably infinite set of points, such as the set of all rational numbers, also has measure zero.
Because of the last bullet, the members of the $L^p$ spaces are technically not functions, but rather equivalence classes of functions, where the equivalence relation is $$f \sim g \iff f=g \ \ {\rm a.e.}$$ where "a.e." (almost everywhere) means everywhere, except possibly on a set of measure zero.
The $L^{\infty}$ norm of a function (technically, an equivalence class of functions) $f$ is defined as the essential supremum of $f$. The essential supremum is the supremum, or least upper bound, of $f$, except possibly on a set of measure zero. For example, if $f(x) = 0$ for $x \neq 5$ and $f(5)=1$, then the supremum of $f$ is $1$, but the essential supremum of $f$ is $0$ since $f \leq 0$ except on the set $\{ 5 \}$, which is a single point and thus has measure zero.

Given the complexity of going into measures, the construction of the Lebesgue integral, and various Lebesgue integral convergence theorems, I won't delve further into the $L^p$ spaces in this post.

The $\ell^p$ spaces are enough to illustrate that function spaces (sequence spaces are a form of function spaces, just with a discrete set of input values) can possess a vector space structure as well as a norm and, for $p=2$, an inner product. Thinking of functions as members of a normed vector space is subtle, but as mentioned at the beginning of the post, it provides the mathematical foundation for numerous applications, some of which I hope to explore in future posts.

Part 2 of this post will explore linear functionals and duality, once again focusing on the $\ell^p$ spaces as a representative example.

I will conclude this post with the deferred proof of Hölder's inequality, but first, we'll need a lemma known as Young's inequality. For both of the proofs below, let $p$ and $q$ be positive real numbers satisfying $\frac{1}{p}+\frac{1}{q}=1$, known as Hölder conjugates.

Lemma (Young's inequality for products): For any two non-negative real numbers $\alpha$ and $\beta$, we have $$
\alpha \beta \leq \frac{\alpha ^ p}{p} + \frac{\beta ^ q}{q}
$$ with equality if and only if $\beta = \alpha^{p-1}$.

Proof: Note that $$
\begin{align}
& &\frac{1}{p}+\frac{1}{q}&=1 \\[2mm]
&\implies &q+p &= pq \\[2mm]
&\implies &q+p + (1-p-q) &= pq+(1-p-q) \\[2mm]
&\implies &1 &= (p-1)(q-1) \\[2mm]
&\implies &\frac{1}{p-1} &= q-1
\end{align}
$$ so that for some numbers $t$ and $u$, we have $u=t^{p-1} \iff t=u^{q-1}$. In other words $u(t)=t^{p-1}$ and $t(u)=u^{q-1}$ are inverse functions.

Let $\alpha, \beta \geq 0$. If either one is zero, the inequality is trivially true, so assume they are both positive. Then we have $$
\alpha \beta \leq \color{blue}{\int_{0}^{\alpha}{t^{p-1} \, dt}} + \color{red}{\int_{0}^{\beta}{u^{q-1} \, du}}
= \frac{\alpha^p}{p} + \frac{\beta^q}{q}
$$ The inequality follows from the fact that $\alpha \beta$ is the area of the rectangle in the figure below, whose top edge is at $u=\beta$, below $u(\alpha) = \alpha^{p-1}$ (since $u(t)$ is an increasing function). When $\beta=\alpha^{p-1}$, the inequality is an equality, since the rectangle's top edge would coincide with the upper tip of the blue region.

Note that the inequality is still true if $\beta > \alpha^{p-1}$, since in that case, $\alpha < \beta^{q-1}$; since $t(u)=u^{q-1}$ is also an increasing function, this would result in extra area of the red zone sticking out to the right of the $\alpha \beta$-rectangle.
$\square$

Hölder's inequality: Let ${\bf x} \in \ell^p$ and ${\bf y} \in \ell^q$. Then $$
\sum_{i=1}^{\infty}{|x_i y_i|} \leq
\left( \sum_{i=1}^{\infty}{|x_i|^p} \right)^{1/p} \left( \sum_{i=1}^{\infty}{|y_i|^q} \right)^{1/q}
$$ Before giving the proof, I want to point out that

If $p=q=2$ (this choice of $p$ and $q$ does satisfy the assumption $\tfrac{1}{p}+\tfrac{1}{q}=1$), then Hölder's inequality is just the infinite-sum version of the Cauchy-Schwarz inequality.
The statement of Hölder's inequality can also be written in terms of the $p$-norms: $$\| {\bf z} \|_1 \leq \| {\bf x} \|_{p} \| {\bf y} \|_{q} $$ where ${\bf z}$ is the sequence whose $i$-th component is $z_i = x_i y_i$. So the inequality also implies that ${\bf z} \in \ell^1$.

Proof of Hölder's inequality: If either ${\bf x}$ or ${\bf y}$ is the zero vector, then the inequality is trivially true, so assume both are non-zero. Then $\|{\bf x}\|_{p}$ and $\|{\bf y}\|_{q}$ are non-zero, and so we can define the unit vectors ${\bf u} = \frac{1}{\|{\bf x}\|_p} {\bf x}$ and ${\bf v} = \frac{1}{\|{\bf y}\|_q} {\bf y}$. Then, by Young's inequality, $$
|u_i v_i| \leq \frac{|u_i|^p}{p} + \frac{|v_i|^q}{q} \tag{$\star$}
$$ for all $i \in {\Bbb N}$.

Since ${\bf x}$ (and thus ${\bf u}$) is in $\ell^p$, and similarly, ${\bf v} \in \ell^q$, the series $\sum_{i=1}^{\infty}{|u_i|^p}$ and $\sum_{i=1}^{\infty}{|v_i|^q}$ both converge. Using the comparison test and $( \star )$, we can conclude that the series $\sum_{i=1}^{\infty}{|u_i v_i|}$ also converges, and thus the sequence ${\bf w} = (u_i v_i)_{i \in {\Bbb N}}$ is in $\ell^1$.

Since ${\bf z} = \|{\bf x}\|_{p} \|{\bf y}\|_{q} {\bf w}$ (i.e. ${\bf z}$ is a scalar multiple of ${\bf w}$), ${\bf z} \in \ell^1$ as well.

Finally, by summing both sides of $( \star )$ from $i=1$ to $\infty$ and using the fact that ${\bf u}$ and ${\bf v}$ are unit vectors, we obtain$$
\|{\bf w}\|_{1} \leq \frac{1}{p}\|{\bf u}\|_{p} + \frac{1}{q}\|{\bf v}\|_{q} = \frac{1}{p}+\frac{1}{q} = 1
$$ and thus $$
\|{\bf z}\|_{1} = \|{\bf x}\|_{p} \|{\bf y}\|_{q} \|{\bf w}\|_{1} \leq \|{\bf x}\|_{p} \|{\bf y}\|_{q}
$$ $\square$

There are also (Lebesgue) integral versions of Young's inequality and Hölder's inequality which are used for the $L^p$ spaces, but the discrete versions essentially give us the same picture without requiring the machinery of measure theory.

I hope this post was helpful, and stay tuned for Part 2.