Functions as Vectors (Part 2): The Dual Space

Home » Analysis » Linear Algebra » Reader Requests » Functions as Vectors (Part 2): The Dual Space

Functions as Vectors (Part 2): The Dual Space

Posted by gtmath Sunday, January 8, 2017 0 comments

Preliminaries: Functions as Vectors (Part 1), Basis and Dimension

In this post, I'll introduce linear operators and functionals and the notion of duality to the vector space toolbox, focusing again on ${\Bbb R}^n$ and the $\ell^p$ spaces from the last post.

While this content has numerous practical applications, for the purposes of this post, I'm going to focus on the "pure math" aspect, i.e. the study of the topic for the sake of satisfying intellectual curiosity; however, if there is reader interest, I am considering future posts on applications to physics (e.g. quantum mechanics) once I get through the already-outstanding reader requests. To that end, this post will culminate with the (reader-requested) proof of a theorem about the $\ell^p$ spaces' relationship with their dual spaces.

Linear Functionals

Recall that the $\ell^p$ spaces are vector spaces over ${\Bbb R}$ (or ${\Bbb C}$, but once again, we will focus on vector spaces over the real numbers). This means that the addition and scaling of vectors are compatible with the properties of real number addition and multiplication (if this is not clear, see this post), so that an expression like $a{\bf x} + b{\bf y}$, where $a,b$ are numbers/scalars and ${\bf x}, {\bf y}$ are sequences/vectors in $\ell^p$, is a new vector in $\ell^p$.

We can define functions from a vector space to another vector space (or itself)- these are called operators. Let $V$ and $W$ be vector spaces (over the real numbers), and let $T:V \rightarrow W$ be an operator. So $T$ takes input vectors ${\bf v}$ from $V$ and maps them to output vectors ${\bf w}$ in $W$. If for all $c \in {\Bbb R}$, ${\bf v}_1, {\bf v}_2 \in V$, the following hold: $$
\begin{align}
T({\bf v}_1 + {\bf v}_2) &= T({\bf v}_1) + T({\bf v}_2) \tag{1} \\
T(c{\bf v}_1) &= c \, T({\bf v}_1) \tag{2}
\end{align}
$$ then $T$ is called a linear operator. These generalize linear functions on the vector space ${\Bbb R}$ (a vector space in its own right over itself) of the form $f(x) = mx$. Note that condition (2) disqualifies linear functions of the form $f(x) = mx + b$ where $b \neq 0$ from being considered linear operators in the vector space sense since then $f(0) \neq 0$.

Now, if $W$ is the underlying field of scalars of $V$, i.e. $W = {\Bbb R}$ since we're only talking about real vector spaces for now, then $T$ is called a linear functional. While we often use capital letters like $T$ for linear operators between vector spaces, it is common practice to use lower-case Greek letters such as $\phi$ and $\psi$ for linear functionals.

Bounded/Continuous Linear Functionals

In the preliminary post, I mentioned that a norm (measure of the size of a vector) gives us a way to measure the distance between two vectors ${\bf x}$ and ${\bf y}$. Namely, the distance is the size of the difference vector: $d({\bf x}, {\bf y}) = \| {\bf x} - {\bf y} \|$, and this formula coincides with the distance formula for the familiar "arrow vectors" of Euclidean space. This notion of distance allows us to define continuity of linear operators similarly to how we define continuity of functions in calculus.

In calculus, a function is continuous if, when the input values are close enough to a specific value $x_0$, the output values are close to $f(x_0)$. In symbols, given a tolerance $\epsilon$ (typically a small positive number), we need to be able to provide a number $\delta$ such that $|f(x)-f(x_0)| < \epsilon$ whenever $|x-x_0|<\delta$. Here, $\delta$ can depend on both $\epsilon$ and $x_0$, but if it does not depend on $x_0$, then $f$ is called uniformly continuous.

Similarly, an operator $T$ between normed vector spaces $V$ and $W$ is continuous at ${\bf x}_0 \in V$ if for any $\epsilon>0$, there exists a $\delta>0$ such that $$
\|T({\bf x})-T({\bf x}_0)\|_W < \epsilon
$$ whenever $$
\|{\bf x} - {\bf x}_0 \|_V < \delta
$$ Separately, an operator $T$ is called bounded if there exists a bound $M$ on how much $T$ "blows up" the size of an input vector. In symbols, $T$ is bounded if there exists an $M>0$ such that for all ${\bf x} \in V$, $$
\|T({\bf x})\|_W \leq M \| {\bf x} \|_V
$$ Note that here, $M$ does not depend on the choice of ${\bf x}$. Furthermore, the smallest (technically, the least upper bound) $M$ such that this holds is called the operator norm of $T$ and is denoted $\| T \|_{\rm op}$. Thus, it is always the case that for any ${\bf x} \in V$, $\|T({\bf x})\|_W \leq \|T\|_{\rm op} \|{\bf x}\|_V$. We will use this later in the post.

It turns out that when $T$ is linear, boundedness and continuity are equivalent. To simplify the notation, I'll focus on the case of linear functionals (i.e. where $W = {\Bbb R}$ and $\| \cdot \|_W$ is absolute value), but the same proof holds for general $W$'s as well.

Theorem: For a linear functional $\phi$ on a normed vector space $V$, the following are equivalent:

$\phi$ is bounded.
$\phi$ is continuous.
$\phi$ is continuous at ${\bf 0}$.

Proof: To prove the 3 statements are equivalent, we will prove that $1 \implies 2$, $2 \implies 3$, and $3 \implies 1$.

$1 \implies 2$:

Since $\phi$ is linear and bounded, there exists an $M>0$ such that $$
| \phi({\bf x}) - \phi({\bf y}) | = | \phi( {\bf x} - {\bf y})| \leq M \| {\bf x} - {\bf y} \|
$$ Therefore, for any $\epsilon>0$, let $\delta < \epsilon / M$; then the above implies that $| \phi({\bf x}) - \phi({\bf y})| < \epsilon$ whenever $\| {\bf x} - {\bf y} \| < \delta$. So $\phi$ is continuous.

$2 \implies 3$:

If $\phi$ is continuous everywhere, then in particular, it is continuous at ${\bf 0}$.

$3 \implies 1$:

Since $\phi$ is linear, $\phi({\bf 0})=0$. Suppose $\phi$ is continuous at ${\bf 0}$. Then for $\epsilon = 1$, there exists a $\delta>0$ such that $|\phi({\bf x})|<1$ whenever $\|{\bf x}\|<\delta$. For any nonzero ${\bf x} \in V$, define ${\bf u} = \tfrac{1}{2} \delta \tfrac{{\bf x}}{\| {\bf x} \|}$, so that $\| {\bf u} \| = \tfrac{1}{2} \delta$. Thus, $$
1 > |\phi({\bf u})| = \left| \phi \left( \frac{\delta}{2\|{\bf x}\|} {\bf x} \right) \right| = \frac{\delta}{2\|{\bf x}\|} |\phi({\bf x})|
$$ which implies that $|\phi({\bf x})|< \tfrac{2}{\delta}\|{\bf x}\|$. Thus, $\phi$ is bounded with $M=\tfrac{2}{\delta}$.

$\square$

Since the $\delta$ in the first part of the proof does not depend on ${\bf x}$, boundedness is actually equivalent to uniform continuity.

Now, if $V$ is finite-dimensional, then all linear functionals are bounded/continuous. In an infinite-dimensional space, unbounded/discontinuous operators do exist; however, in many spaces, such as $\ell^2$, their existence cannot be shown constructively, but rather is proved using the axiom of choice. This means that in practice, any linear functional which you can think up, and which is defined on the entire $\ell^2$ space (see note below), is bounded.

Note: More specifically, when a space contains the limits of all sequences which "should" converge, i.e. those whose points eventually become arbitrarily close together (known as Cauchy sequences), it is called complete. This basically means that it has no "holes". The real numbers and the $\ell^p$ spaces are complete, while the rational numbers are not, since a Cauchy sequence like $3, 3.1, 3.14, 3.141, 3.1415, \dotsc$ "should" have the limit $\pi$, but this is not a rational number. In an incomplete space, we can sometimes explicitly construct an unbounded linear operator defined on the entire space, but in a complete space, we need the axiom of choice to prove their existence.

However, it is certainly possible to define an unbounded linear operator on an incomplete subspace (called the domain of the operator) of a complete space: the derivative operator, defined on the subset of $L^2$ consisting of differentiable functions, is linear and unbounded. While many important operators are unbounded, these are trickier to deal with since we always need to keep their domains in mind.

The Dual Space

Just as we showed for function spaces such as the $\ell^p$ spaces, we can define a vector space structure on the set of linear functionals on a vector space $V$. Vector addition is defined by $$
(\phi+\psi)({\bf x}) = \phi({\bf x}) + \psi({\bf x})
$$ and scalar multiplication by $$
(c\phi)({\bf x}) = c \phi({\bf x})
$$ With these operations, the set of all linear functionals on V is a vector space in its own right, usually denoted $V^*$ or $V'$. This is called the (algebraic) dual space of $V$. If we exclude the unbounded linear functionals, then we obtain the continuous dual space of $V$, which avoids the domain issues associated with unbounded operators. For the remainder of this post, I will use the term dual space to refer to the continuous dual space.

Thus far, we have seen definitions and properties of linear functionals but have yet to see what they look like in practice. In the finite-dimensional case, suppose we have a vector space $V$ with basis ${\bf e}_1, {\bf e}_2, \dotsc, {\bf e}_n$. Then for some vector ${\bf x} = x_1 {\bf e}_1 + \dotsb + x_n {\bf e}_n$ and a linear functional $\phi$, the linearity of $\phi$ implies that \[
\begin{align}
\phi({\bf x}) &= \phi( x_1{\bf e}_1 + x_2{\bf e}_2 + \dotsb + x_n{\bf e}_n ) \\[1mm]
&= x_1 \phi({\bf e}_1) + x_2 \phi({\bf e}_2) + \dotsb + x_n \phi({\bf e}_n) \\[1mm]
&= a_1 x_1 + a_2 x_2 + \dotsb + a_n x_n \tag{$\spadesuit$}
\end{align}
\] where $a_i = \phi({\bf e}_i)$. So knowing the action of $\phi$ on the basis vectors tells us its effect on all other vectors by linearity. Since all linear functionals must have the form above, we can specify a linear functional entirely by the numbers $a_1, a_2, \dotsc, a_n$ which represent its evaluation on the basis vectors.

Now, given a basis $\lbrace {\bf e}_i \rbrace$, we can define a corresponding dual basis, denoted $\lbrace {\bf e}^i \rbrace$, of the dual space where we define ${\bf e}^{j}(x_1 {\bf e}_1 + \dotsb + x_n {\bf e}_n) = x_j$. Note that the superscripts just represent labels, not exponents. The so-called dual basis spans $V^*$ since, by $( \spadesuit )$, any functional $\phi$ above has the representation $\phi = a_1 {\bf e}^1 + \dotsb + a_n {\bf e}^n$. Linear independence is also easy to prove.

Given the dual basis, the dual space $V^*$ is starting to look like the same thing as the original space $V$, just with different labels, i.e. it's starting to look like $V$ and $V^*$ are isomorphic (don't worry, I'll define this word rigorously below). Indeed, this is the case for finite-dimensional spaces. In the next section, I will show you that this isn't exactly the case with the infinite-dimensional $\ell^p$ spaces, but that a similar result holds.

Before we go there, a quick note on etymology: the dual space $V^*$ is called "dual" because it presents us with an alternative way to unambiguously specify a given vector in $V$. The element ${\bf e}^j$ of the dual basis essentially measures the ${\bf e}_j$-component of an input vector. Accordingly, if we know how all the ${\bf e}^j$'s (and thus all functionals) act on a vector ${\bf x}$, then we know the components of ${\bf x}$, i.e. we know which vector ${\bf x}$ is without ambiguity. The following proposition formalizes this idea.

Proposition: Let ${\bf x}, {\bf y} \in V$ be vectors, and assume that for all functionals $\phi \in V^*$, we have $\phi({\bf x})=\phi({\bf y})$. Then ${\bf x}={\bf y}$.

Proof: Define ${\bf z} = {\bf x} - {\bf y}$. Then given any functional $\phi \in V^*$, the assumption above and the linearity of $\phi$ imply that \[
\phi({\bf z}) = \phi({\bf x}-{\bf y}) = \phi({\bf x})-\phi({\bf y}) = 0 \tag{$\clubsuit$}
\] Assume ${\bf x} \neq {\bf y}$, i.e. ${\bf z} \neq {\bf 0}$. Since ${\bf z}$ is nonzero, $\lbrace {\bf z} \rbrace$ is a linearly independent set, so we can extend it to a basis $B_{\bf z} = \lbrace {\bf z}, {\bf e}_2, \dotsc, {\bf e}_n \rbrace$ for $V$.

For a generic vector ${\bf v} \in V$ with coordinates $(v_1, v_2, \dotsc, v_n)$ in the basis $B_{\bf z}$, define the functional $\phi$ by $\phi({\bf v}) = v_1$ (this is just ${\bf e}^1$ for the basis $B_{\bf z}$). Then $\phi$ is a functional with $\phi({\bf z})=1$, which contradicts $( \clubsuit )$. Therefore, it must be the case that ${\bf z} = 0$ after all.
$\square$

The Dual of $\ell^p$

By a similar argument (to be formally justified in the proof below) as that presented above for a finite-dimensional vector space, all linear functionals $\phi \in \left( \ell^p \right)^*$ will take the form \[
\phi({\bf x}) = a_1 x_1 + a_2 x_2 + a_3 x_3 + \dotsb \tag{$\spadesuit$}
\] for a generic vector ${\bf x} = (x_1, x_2, x_3, \dotsc) \in \ell^p$. Since we are talking about the continuous dual space, any such $\phi$ must also be bounded, which by definition means that \[
|\phi({\bf x})| \leq \| \phi \|_{\rm op} \| {\bf x} \|_p \tag{$\heartsuit$}
\] In other words, we are only dealing with functionals such that the series in $( \spadesuit )$ converges.

The linear functionals ${\bf e}^j$ defined by ${\bf e}^j({\bf x}) = x_j$ once again obviously span the dual space and are linearly independent, so they once again form the standard dual basis, and we can represent the functional $\phi = a_1 {\bf e}^1 + a_2 {\bf e}^2 + a_3 {\bf e}^3 + \dotsb$ by its coordinates in this basis: $\phi = (a_1, a_2, a_3, \dotsc )$.

In this form, an element of the dual space looks like another infinite sequence which, by $( \heartsuit )$, satisfies some sort of convergence condition. Thus, it stands to reason that we may be able to identify linear functionals, i.e. elements of $\left( \ell^p \right)^*$, with elements of one of the spaces $\ell^q$ for a suitable value of $q$.

We are going to make exactly such an identification, and in order to do so, we need to rigorously define what it means to "identify" an element of one vector space with an element of another. Suppose we have vector spaces $V$ and $W$ and a function $T: V \rightarrow W$ which maps elements of $V$ to elements of $W$. If $T$ is one-to-one (also known as a bijection), which means that for each ${\bf w} \in W$, there is one and only one ${\bf v} \in V$ for which $T({\bf v})={\bf w}$, and $T$ preserves vector addition and scalar multiplication, i.e. \[
T(a{\bf v}_1 + b{\bf v}_2) = aT({\bf v}_1) + bT({\bf v}_2)
\] then we call $T$ a (vector space) isomorphism and write $V \cong W$.

The preservation of vector addition and scalar multiplication, the cornerstones of a vector space structure, amounts to nothing more than the definition of linearity, so an isomorphism is actually just a linear operator which is also one-to-one. Furthermore, because of linearity, if a linear operator $T$ maps basis vectors one-to-one to basis vectors, then we can already conclude that $T$ is an isomorphism. Finally, if an isomorphism preserves the norm/metric, i.e. $\|T({\bf v})\|_W = \|{\bf v}\|_V$ for all ${\bf v} \in V$, then it is called an isometry, and $V$ and $W$ are called isometrically isomorphic.

With all the terminology, definitions, and explanation from the preliminary posts and this post out of the way, we can finally state and prove the following theorem which answers Charles Stephens's Reader Request:

Theorem: Let $p$ and $q$ be Hölder conjugates, i.e. $\tfrac{1}{p}+\tfrac{1}{q}=1$, with $1 < p,q < \infty$. Then $\left( \ell^p \right)^*$ is isometrically isomorphic to $\ell^q$, where the norm on $\left( \ell^p \right)^*$ is understood to be the operator norm.

Proof: We need to show the existence of a one-to-one bounded linear operator $T: \ell^q \rightarrow (\ell^p)^*$ which is also an isometry. For a sequence ${\bf x} = (x_1, x_2, x_3, \dotsc) \in \ell^q$, define $T({\bf x})$ to be the functional $\phi_{\bf x} \in (\ell^p)^*$ which maps ${\bf y} = (y_1, y_2, y_3, \dotsc) \in \ell^p$ to the number \[
\phi_{\bf x}({\bf y}) = \sum_{i=1}^{\infty}{x_i y_i}
\] By Hölder's inequality, $|\phi_{\bf x}({\bf y})| \leq \|{\bf x}\|_q \|{\bf y}\|_p$, which implies that

the sum specified by $\phi_{\bf x}({\bf y})$ converges, i.e. $\phi_{\bf x}$ is indeed an element of $(\ell^p)^*$, i.e. $T$ is well defined, and
$\| \phi_{\bf x} \|_{\rm op} \leq \| {\bf x} \|_q$, so $T$ is bounded.

Furthermore, $T$ is clearly linear from its definition, so $T$ is a well defined, bounded linear operator.

We will show that $T$ is one-to-one by showing that it has an inverse (can you see why being one-to-one is equivalent to having an inverse?). Define the linear operator $U: (\ell^p)^* \rightarrow \ell^q$ by \[
U({\phi})= (\phi({\bf e}_1), \phi({\bf e}_2), \phi({\bf e}_3), \dotsc)
\] For notational simplicity, define $b_j = \phi({\bf e}_j)$, so that $U({\phi})={\bf b}$. Now, define \[
a_j = \cases{
\frac{|b_j|^q}{b_j} & \text{if } j \leq n \\
0 & \text{if } j > n
}
\] where $n$ is (for now) some fixed integer, and $a_j$ is interpreted to be $0$ if $b_j=0$. We want to show that $U$ is the inverse, i.e. "undoes" the action, of $T$, but first we need to show it is well defined and bounded (it is obviously linear by its definition). Since we'll be taking a limit as $n \rightarrow \infty$, we'll assume without loss of generality that $n$ is large enough that at least one of the $b_j$'s is non-zero; we know such an $n$ exists whenever $\phi \neq 0$, while if $\phi=0$, we already know $U(\phi)={\bf 0}$.

First of all, since the sequence ${\bf a} = (a_j)$ terminates at the finite value $n$, ${\bf a}$ is certainly in $\ell^p$. Secondly, using the fact that $q=\tfrac{p}{p-1}$, we can do a few simple algebraic manipulations (I'll skip them here for the sake of brevity) to show that \[
\| {\bf a} \|_p = \left( \sum_{j=1}^{n}{|b_j|^q} \right)^{1/p} \tag{$\dagger$}
\] Also, \[
\begin{align}

\phi({\bf a})

&= \phi \left( \left(
\frac{|b_1|^q}{b_1}, \frac{|b_2|^q}{b_2}, \dotsc, \frac{|b_n|^q}{b_n}, 0, 0, 0, \dotsc
\right) \right) \\[2mm]

&= \phi \left(
\frac{|b_1|^q}{b_1} {\bf e}_1 + \frac{|b_2|^q}{b_2} {\bf e}_2 + \dotsb + \frac{|b_n|^q}{b_n} {\bf e}_n
\right) \\[2mm]

&= \phi \left(
\sum_{j=1}^{n}{\frac{|b_j|^q}{b_j} {\bf e}_j}
\right) \\[2mm]

&=\sum_{j=1}^{n}{
\phi \left( \frac{|b_j|^q}{b_j} {\bf e}_j \right)
} \\[2mm]

&=\sum_{j=1}^{n}{
\frac{|b_j|^q}{b_j} \phi \left( {\bf e}_j \right)
} \\[2mm]

&=\sum_{j=1}^{n}{
\frac{|b_j|^q}{b_j} b_j
} \\[2mm]

&=\sum_{j=1}^{n}{
|b_j|^q
} \tag{$\ddagger$}

\end{align}
\] Therefore, \[
\begin{align}
\left( \sum_{j=1}^{n}{|b_j|^q} \right)^{1/q}
&= \left( \sum_{j=1}^{n}{|b_j|^q} \right)^{1-1/p} \\[2mm]
&= \frac{\sum_{j=1}^{n}{|b_j|^q}}{\left( \sum_{j=1}^{n}{|b_j|^q} \right)^{1/p}} \\[2mm]
&= \frac{|\phi({\bf a})|}{\|{\bf a}\|_p} \tag{by $\dagger, \ddagger$}\\[2mm]
&\leq \| \phi \|_{\rm op}
\end{align}
\] Since this holds for all $n$ (large enough as mentioned above), we can take the limit as $n \rightarrow \infty$ to conclude that $\|{\bf b}\|_q = \| U({\phi}) \|_q \leq \| \phi \|_{\rm op}$. In other words, $U$ is well defined and bounded.

Now it is a piece of cake to show that $U$ and $T$ are inverses: \[
\begin{align}
U(T({\bf x})) = U(\phi_{\bf x})
&= (\phi_{\bf x}({\bf e}_1), \phi_{\bf x}({\bf e}_2), \phi_{\bf x}({\bf e}_3), \dotsc) \\[1mm]
&= (x_1, x_2, x_3, \dotsc) = {\bf x}
\end{align}
\] and \[
\begin{align}
T(U(\psi))({\bf y}) = \phi_{U(\psi)}({\bf y})
&= \sum_{i=1}^{\infty}{\psi({\bf e}_i) y_i} \\[1.5mm]
&= \sum_{i=1}^{\infty}{\psi(y_i {\bf e}_i)} \\[1.5mm]
&= \psi \left( \sum_{i=1}^{\infty}{y_i {\bf e}_i} \right)
= \psi({\bf y})
\end{align}
\] i.e. $T(U(\psi))$ is the same functional as $\psi$ since they have the same action on any input vector ${\bf y} \in \ell^p$.

Finally, we showed at the beginning of the proof that for any ${\bf x} \in \ell^q$, we have $\| \phi_{\bf x} \|_{\rm op} = \| T({\bf x}) \|_{\rm op} \leq \|{\bf x}\|_q$.

We also showed, using $(\dagger)$ and $(\ddagger)$, that if ${\bf x} \in \ell^q$ is such that ${\bf x}=U(\phi)$ for some $\phi \in (\ell^p)^*$, then $\| \phi \|_{\rm op} \geq \|{\bf x}\|_q$. But since $U$ and $T$ are inverses, ${\bf x}=U(\phi) \iff \phi = T({\bf x}) = \phi_{\bf x}$, so $\| \phi_{\bf x} \|_{\rm op} = \| T({\bf x}) \|_{\rm op} \geq \|{\bf x}\|_q$. Thus, $\|T({\bf x})\|_{\rm op} = \|{\bf x}\|_q$, which proves that $T$ is an isometry.
$\square$

It's worth noting that the proof above can be slightly modified to prove that $(\ell^1)^* \cong \ell^{\infty}$, the space of all bounded sequences. On the other hand, $(\ell^{\infty})^* \ncong \ell^1$ and is a bit more complicated. However, it is the case that $(c_0)^* \cong \ell^1$, where $c_0$ is the subspace of $\ell^{\infty}$ consisting of all sequences which converge to $0$.

That will conclude this post, and I hope it was informative/enjoyable. Feel free to post any questions in the comments section.

gtMath