Powered by Blogger.

Basis and Dimension

Preliminary: Euclidean Space and Vectors

A sheet of paper is two-dimensional because, given any reference point on the sheet, which we can call $(0,0)$, any other point can be specified by how far to the right and how far up it is from the reference. This is just the notion of an arrow with a magnitude and direction from the first post on vectors.

Similarly, space as we know it is 3-dimensional because, given any reference point, which we can call $(0,0,0)$, any other point can be specified by its distance in length, width, and depth from the reference.

In this post, we'll formalize the dimension of a vector space by introducing the concept of a basis. I will introduce the basic terminology as well as full proofs of a few important results; while this won't be an exhaustive linear algebra primer by any means, any further questions are welcome in the comments section.


Basis of a vector space


Sticking with the 3-dimensional space example, let's call the $x$-axis the left-right direction, so that $(1,0,0)$ corresponds to a step one unit to the right. A step forward would be along the $y$-axis, or $(0,1,0)$, and a step upwards would be along the $z$-axis at $(0,0,1)$. As discussed in the preliminary post, an arrow of any direction and magnitude can be written as a linear combination of these 3 vectors. In symbols, if we call these three ${\bf e}_1$, ${\bf e}_2$, and ${\bf e}_3$, then any vector ${\bf v}$ can be written as $$
{\bf v} = a_1 {\bf e}_1 + a_2 {\bf e}_2 + a_3 {\bf e}_3
$$ for some scalars $a_1, a_2, a_3$. We write this vector as ${\bf v} = (a_1, a_2, a_3)$, implicitly expressing it in terms of the ${\bf e}_i$'s. Since any vector can be written this way, we say that the ${\bf e}_i$'s span the space ${\Bbb R}^3$.

If a vector ${\bf v}$ can be written as a linear combination of another set of vectors as above (without all $a_i$'s equal to $0$), then it is called linearly dependent on those vectors. If not, then the vectors are linearly independent. Clearly, we can't combine the ${\bf e}_i$ vectors above in any way to get the other ones, so they are linearly independent.

A set of vectors which (1) is linearly independent and (2) spans the entire vector space is called a basis, and the ${\bf e}_i$'s above are called the standard basis of ${\Bbb R}^3$. Essentially, a basis is a "minimal" (see the next section) set which still spans all of $V$ using the vector space operations of vector addition and scalar multiplication: the linear independence requirement ensures that a basis excludes any extra "trash" that doesn't add to the span. For example, since scalar multiplication is built into any vector space, if we have ${\bf e}_1 = (1,0,0)$ in our basis, we clearly don't also need $(2,0,0) = 2 {\bf e}_1$; any vector expressed as a linear combination including $(2,0,0)$ could include ${\bf e}_1$ instead with the coefficient doubled, so $(2,0,0)$ really isn't adding to the span.

On the other hand, we could remove $(1,0,0)$ from the standard basis and replace it with $(2,0,0)$ to obtain a different basis, so a basis certainly isn't unique. In fact, we will prove below that any linearly independent set of vectors can be extended to a basis. However, the constants $a_i$ which express a given vector in terms of a particular basis are unique: if different constants $b_i$ existed, then we could have a vector ${\bf v}$ such that $$
\begin{align}
{\bf v} &= a_1 {\bf e}_1 + a_2 {\bf e}_2 + a_3 {\bf e}_3 \\[1mm]
&= b_1 {\bf e}_1 + b_2 {\bf e}_2 + b_3 {\bf e}_3 \\[2mm]
\implies {\bf v} - {\bf v} = {\bf 0} &= (a_1-b_1) {\bf e}_1 + (a_2-b_2) {\bf e}_2 + (a_3-b_3) {\bf e}_3
\end{align}
$$ If $a_i \neq b_i$ for at least one value of $i$, then this would contradict the fact that the basis vectors are linearly independent (since a linear combination of them without all coefficients equal to 0 yields the zero vector). Thus, we must have $a_i=b_i$ for all values of $i$, i.e. the representation of ${\bf v}$ in terms of the particular basis is unique. The same argument applies to vector spaces other than just ${\Bbb R}^3$.


Dimension of a vector space


The number of vectors forming a basis (which is infinite for some spaces) is called the dimension of the space. ${\Bbb R}^3$ has dimension $3$ because it has $\{ {\bf e}_1, {\bf e}_2, {\bf e}_3 \}$ as a basis, while the $\ell^p$ spaces from the last post have infinite dimension. Their standard bases consist of sequences with a $1$ for the $i$-th component and zeros elsewhere.

Defining dimension in this way implies that every basis has the same number of elements. This is indeed the case, and the proof for finite-dimensional spaces is a consequence of the following related lemma.

Lemma: If $B = \{ {\bf b}_1, {\bf b}_2, \dotsc, {\bf b}_n \}$ is a basis for a vector space $V$, and $W = \{ {\bf w}_1, {\bf w}_2, \dotsc, {\bf w}_m \}$ is a linearly independent set of vectors, then $m \leq n$. If $m=n$, then $W$ spans $V$ and is thus another basis.

Proof: Since $B$ is a basis for $V$, we can express ${\bf w}_1$ as a linear combination of the vectors in $B$, $$
{\bf w}_1 = c_1 {\bf b}_1 + c_2 {\bf b}_2 + \dotsb + c_n {\bf b}_n
$$ where at least one of the $c_i$'s is nonzero. Let $j$ be an index such that $c_j \neq 0$. Then we have $$
{\bf b}_j = \tfrac{1}{c_j}{\bf w}_1 - \tfrac{c_1}{c_j}{\bf b}_1 - \dotsb - \tfrac{c_{j-1}}{c_j}{\bf b}_{j-1} - \tfrac{c_{j+1}}{c_j}{\bf b}_{j+1} - \dotsc - \tfrac{c_n}{c_j}{\bf b}_n
$$ Since $B$ is a basis, any vector ${\bf v} \in V$ can be written as a linear combination of the ${\bf b}_i$'s: $$
\begin{align}
{\bf v} &= v_1 {\bf b}_1 + v_2 {\bf b}_2 + \dotsb + v_n {\bf b}_n \\[2mm]
&= v_1 {\bf b}_1 + \dotsb + v_{j-1} {\bf b}_{j-1} \\[1mm]
& \ \ \ + v_j \underbrace{\left[ \tfrac{1}{c_j}{\bf w}_1 - \tfrac{c_1}{c_j}{\bf b}_1 - \dotsb - \tfrac{c_{j-1}}{c_j}{\bf b}_{j-1} - \tfrac{c_{j+1}}{c_j}{\bf b}_{j+1} - \dotsc - \tfrac{c_n}{c_j}{\bf b}_n \right]}_{{\bf b}_j} \\[1mm]
& \ \ \  +  v_{j+1} {\bf b}_{j+1} + \dotsb + v_n {\bf b}_n
\end{align}
$$ This eliminates ${\bf b}_j$ from the expansion, showing that the arbitrary vector ${\bf v}$ is also a linear combination of the set $B_1 = \{ {\bf b}_1, {\bf b}_2, \dotsc , {\bf b}_{j-1}, {\bf w}_1, {\bf b}_{j+1}, \dotsc , {\bf b}_n \}$. Thus, $B_1$ spans $V$.

Furthermore, $B_1$ is linearly independent: since ${\bf w}_1$ can be uniquely expressed as a linear combination of the vectors in the original basis $B$, a linear combination of the vectors in $B_1$ can be uniquely rewritten as a linear combination of vectors in $B$ by replacing ${\bf w}_1$ by its expansion $c_1 {\bf b}_1 + c_2 {\bf b}_2 + \dotsb + c_n {\bf b}_n$. Since that unique expansion includes a nonzero factor of ${\bf b}_j$, the resulting linear combination of elements of $B$ has at least one nonzero coefficient. Therefore, if such a combination yielded ${\bf 0}$, that would contradict the fact that $B$ (assumed to be a basis) is linearly independent. So $B_1$ must be linearly independent and thus is a basis for $V$.

Now, since $B_1$ is a basis, we can uniquely express ${\bf w}_2$ as a linear combination $$
{\bf w}_2 = d_1 {\bf b}_1 + d_2 {\bf b}_2 + \dotsb + d_{j-1} {\bf b}_{j-1} + d_j {\bf w_1} + d_{j+1} {\bf b}_{j+1} + \dotsb + d_n {\bf b}_n
$$ Since $W$ is linearly independent, it is not the case that ${\bf w}_2 = d_j {\bf w}_1$ with all other $d_i$'s equal to zero, so there exists a $d_k \neq 0$ with $k \neq j$. Then we can repeat the whole process to kick out ${\bf b}_k$ and replace it with ${\bf w}_2$, yielding a new basis $B_2$.

If $m=n$, then we can continue in this fashion until we kick out the last of the original vectors from $B$ at the $n$-th iteration, obtaining a new basis $B_n$ consisting of the vectors ${\bf w}_1, {\bf w}_2, \dotsc, {\bf w}_n$. This proves that any linearly independent set of $n$ vectors is a basis.

Furthermore, if $m>n$, we can continue the process and uniquely express ${\bf w}_{n+1}$ as a linear combination of the basis $B_n = \{ {\bf w}_1, {\bf w}_2, \dotsc, {\bf w}_n \}$. This is a contradiction, since $W$ is assumed to be linearly independent. Thus, it must be the case that $m \leq n$.
$\square$

Corollary: If $A = \{ {\bf a}_1, \dotsc {\bf a}_m \}$ and $B = \{ {\bf b}_1, \dotsc {\bf b}_n \}$ are two bases for a finite-dimensional vector space $V$, then $m=n$.

Proof: By the above lemma, $m \leq n$ since $A$ is linearly independent, and $B$ is a basis. Also, $n \leq m$ since $B$ is linearly independent, and $A$ is a basis. Therefore, $m=n$.
$\square$

Note that this also allows us to conclude that in a vector space $V$ with dimension $n$, a set with fewer than $n$ vectors cannot span $V$. Thus, a basis is a minimal set that still spans all of $V$. On the other hand, by the lemma, a basis is a maximal set of vectors that is still linearly independent.


Extending to a basis


If we have a vector space $V$ with finite dimension $n$ and a set $S$ of $p$ linearly independent vectors with $p<n$, then we can extend $S$ to a basis for $V$.

Indeed, since $S$ has only $p$ vectors, it cannot span $V$. Thus, there exists a vector ${\bf v}_1$ such that ${\bf v}_1$ is not in the span of $S$ (the set of all linear combinations of the vectors in $S$). So $S \cup \{ {\bf v}_1 \}$ is a linearly independent set of $p+1$ vectors.

Continuing in this manner until we have exactly $n$ vectors (which will take $n-p$ steps), we will obtain a set $S'$ of $n$ linearly independent vectors. The lemma above implies that $S'$ will span $V$ and thus be a new basis.

In practice, finding the vectors ${\bf v}_1, {\bf v}_2, \dotsc, {\bf v}_{n-p}$ as described above consists of either guess-and-check or solving sets of linear equations. For example, if we have the vectors $(0,3,0)$ and $(1,0,-2)$ and would like to extend to a basis for ${\Bbb R}^3$, then we need to find a vector ${\bf x} = (x_1, x_2, x_3)$ such that ${\bf x}$ is linearly independent from the first two vectors. If ${\bf x}$ were a linear combination, then we would have:$$
(x_1, x_2, x_3) = a_1 (0,3,0) + a_2 (1,0,-2)
$$ This gives us 3 equations: $$
\begin{align}
x_1 &= a_2 \\[2mm]
x_2 &= 3 a_1 \\[2mm]
x_3 &= -2 a_2
\end{align}
$$ We just need to find one vector which does not satisfy these equations, regardless of the choice of the $a_i$'s. Since the first and third equations imply that $-\tfrac{1}{2}x_3 = x_1$, any vector not satisfying that equation will work. In this case ${\bf x} = (0,0,1)$ would work. Note that the solution is not unique; there are infinitely many solutions, such as ${\bf x} = (0,0,x_3)$ for any value of $x_3$ as well as many other solutions.

While the system of equations above was rather easy, we can use the more formal method of Gaussian elimination to simplify tougher systems.


Extending to a basis- infinite-dimensional case


In an infinite-dimensional vector space, we can also extend a linearly independent set to a basis, but the proof of this fact is not as simple as setting up a system of linear equations to solve (there would be infinitely many equations). In fact, the proof relies on the axiom of choice and thus does not show us how to actually perform the basis extension. For the sake of completeness, I'll provide a sketch of the proof anyway, with the warning that it won't make much sense to those not somewhat well versed in set theory.

Sketch of proof: Let $V$ be a vector space, not necessarily finite-dimensional, and let $S$ be a linearly independent subset of $V$. Let $A$ be the collection of sets whose members are the linearly independent subsets of $V$ which contain $S$. In symbols, $$
A = \{ T \subseteq V : \ S \subseteq T \text{ and $T$ is linearly independent} \}
$$ An arbitrary chain in $A$ takes the form $T_1 \subseteq T_2 \subseteq T_3 \subseteq \dotsb$ where $T_i \in A$ for $i=1,2,3,\dotsc$. The set $T_{\cup} = \bigcup_{i=1}^{\infty}{T_i}$ is an upper bound for the chain in that it is bigger (as defined by the $\subseteq$ relation) than each element of the chain.

Since all chains have an upper bound, Zorn's Lemma (equivalent to the axiom of choice, though I will omit the proof of this fact) implies that $A$ has a maximal element, i.e. an element $M$ such that $T \subseteq M$ for all $T \in A$. Suppose $M$ does not span $V$. Then there exists a vector ${\bf z} \in V$ which is not a linear combination of the vectors in $M$. But then $M' = M \cup \{ {\bf z} \}$ is a linearly independent set containing $M$ and thus $S$. So $M' \in A$, a contradiction since $M$ was supposed to be a maximal element. Thus it must be the case that $M$ does span $V$, so $M$ is a basis.
$\square$

As with most AC-based proofs, this one isn't very satisfying, but there you have it: even in the infinite-dimensional case, you can extend a linearly independent set to a basis (but I can't tell you how!).

That will do it for this post- thanks for reading and please post any questions in the comments section.

The Axiom of Choice

Preliminary: set basics

The axiom of choice is a somewhat controversial axiom of set theory which is frequently utilized but not well known by most non-math folks. In this brief post, I'll demystify the axiom of choice and explain some of the issues that arise from its use.


Statement of the Axiom


The axiom of choice (or "AC" for short) states that, given a collection of non-empty sets, there exists a way to choose one element from each of them.

In order to state the axiom more precisely, we first need to define a choice function. Given a collection $X$ of non-empty sets, a choice function is a function which assigns to each set in the collection one of its own elements. In symbols, it is a function $$
f: X \rightarrow \bigcup_{A \in X}{A}
$$ with the property that for all $A \in X$, $f(A) \in A$.

Take a second to read through that again as it can be a bit confusing: $X$ is a set whose elements are sets. If $A$ is a set in this collection/set of sets (i.e. an element of $X$), then $A$ is assumed to be non-empty and thus has at least one element. A choice function takes sets like $A$ as inputs, and assigns to them an output from the set $\bigcup_{A \in X}{A}$ of all elements of sets (like $A$) in the collection. This output must be an element of $A$ itself (the input "value") every time in order for $f$ to be a choice function. Put differently, $f$ cannot assign to any set $A$ an output value $b$, where $b \in B$ for some other set $B \in X$ with $B \neq A$.

With that out of the way, the axiom of choice states that for any collection $X$ of non-empty sets, a choice function exists. This is the rigorous expression of the plain-English version above: given a collection of non-empty sets, there exists a way to choose one element from each of them.


"Sounds pretty obvious. What's the big controversy?"


As a practical example, suppose we have a bunch of buckets with items inside. The AC just tells us that we can pick one item from each. For a finite number of buckets, this really is obvious and actually can be proven by induction from other basic axioms of set theory.

However, suppose we have a countably infinite number of buckets, i.e. we have numbered buckets labeled $1, 2, 3, 4, \dotsc$ going on forever. You can go to the first bucket and reach in to choose an item, then the second, then the third, etc. For any finite number of buckets, this sequence of events will obviously terminate. But since the buckets go on forever, at no point will you ever have chosen an item from every bucket; a choice function needs to specify an item for every bucket. It's even worse if you have uncountably many buckets, e.g. if you have one bucket for every real number instead of just the numbers $1,2,3, \dotsc$.

Now, if you have a formulaic way to specify which item should be chosen from each bucket, then there's actually still no issue in defining a choice function. For example, suppose each of the infinitely many buckets contains several slips of paper, each with a single number $1,2,3, \dotsc$ on it and with no repeats within any buckets. Then each bucket would contain a slip with the smallest label. Thus, there exists a choice function for these buckets: choose the smallest label from each bucket. This is a precise specification for a selection of one item from every bucket.

The issue arises when there is no formulaic way to make a choice for all the buckets. For example, suppose $X$ is the set of all non-empty subsets of the real number line. This $X$ is uncountably infinite and contains some subsets such as $\{3, 123412, \pi, \text{all numbers greater than a million} \}$ for which we can still say, "choose the smallest element" (in this case, $3$). But $X$  also contains sets such as $(0,1) = \{x \in {\Bbb R} \ | \ 0 < x < 1 \}$ which have no smallest element. There is no formulaic way to specify how one would choose one element from every non-empty subset of the real numbers.


"Fine, I admit it can be a little dicey: clearly, you can't always write some kind of computer algorithm every time to create a choice function in practice. But does assuming the AC really cause any harm?"


Sort of, but it depends on your point of view.

One way to prove something exists is to specify an algorithm to construct it; this is called a constructive proof. On the other hand, when a proof of something's existence relies on the axiom of choice, it doesn't tell you how to actually construct the "something".

This may not be a problem if we don't care about constructing the thing in practice. But the AC also allows one to prove some bizarre results such as the Banach-Tarski paradox, which states that it is possible to reassemble pieces of a ball into two copies of the same ball just by moving and rotating (without stretching/shrinking). It contradicts basic geometric intuition that this could be the case, since the volumes of the two new balls would be the same as that of the original one. The role of the axiom of choice here is that it allows for the construction of sets of points for which "volume" is not defined (non-measurable sets). Of course, it doesn't actually tell us how to do this, only that it can be done.

To conclude, even though the AC seems obvious, it amounts to assuming the theoretical existence of objects which can never be built in practice. As a result, these objects (such as non-measurable sets) are basically impossible to picture or understand. These never arise in practical applications since you can't actually construct them without the AC, but they lead to some counterintuitive, seemingly paradoxical results.

In applications such as the stochastic calculus used in financial math, you will see the analysis restricted to measurable sets, as everything you can think up (without invoking the AC) will correspond to one of these anyway. I hope this post helps explain why authors need to go through all the trouble of making these restrictions, and also that it helps you recognize instances in which the AC has been invoked in your future forays into pure math.

Functions as Vectors (Part 1)

Preliminaries:


The previous post covered convergence in function spaces, and we saw that different types of convergence correspond to different topologies on the underlying space.

In this post, I will add a few more tools (namely norms and inner products) to the vector space/linear algebra toolbox and show you how we introduce a vector space structure to function spaces. I will focus on the sequence spaces $\ell^p$ as illustrative examples, concluding with a proof of Hölder's inequality for sums.

In Part 2 of this post, I will introduce the notion of the dual space and prove an important result about the $\ell^p$ spaces to complete a recent reader request.

Engineers and physicists in the audience will recall that the function space formalism provides the mathematical foundation for Fourier analysis and quantum mechanics. It also comes into play in the modeling of randomness, on which I am planning some upcoming posts (stay tuned).


Vector space review


In "Euclidean space", I introduced vectors in ${\Bbb R}^n$ as ordered $n$-tuples, which, for $n=2$ and $n=3$, can be thought of as arrows with a length and direction. We saw that we can define addition of vectors as addition of the respective components, i.e. we apply the usual addition of real numbers to each component of the two vectors. We can also "scale" a vector by a number (called a scalar in this context) by multiplying each component by that number, once again in the usual real number sense.

In symbols (using $n=3$ for now), if ${\bf x} = (x_1 , x_2, x_3)$ and ${\bf y} = (y_1 , y_2 , y_3)$ are vectors, and $c$ is a scalar: $$
\begin{align}
{\bf x} + {\bf y} & \buildrel {\rm def} \over{=} (x_1 + y_1, x_2 + y_2, x_3 + y_3) \\[2mm]
c \, {\bf x} & \buildrel {\rm def} \over{=} (cx_1, cx_2, cx_3)
\end{align}
$$ ${\Bbb R}^n$ with addition and scalar multiplication defined in this way satisfies the vector space axioms (refer to the preliminary post).

If we have some set other than ${\Bbb R}^n$ and define a way to add its elements together and multiply them by a scalar (usually from the real numbers $\Bbb R$ or the complex numbers $\Bbb C$, but technically from any field), and if these definitions satisfy the vector space axioms, then the set endowed with these operations is called a vector space, and its elements are called vectors. Thus, the term vector encompasses a more general class of objects than the motivating example of arrows with 2 or 3 components.

The study of vector spaces is called linear algebra and is one of the best understood areas of mathematics. Verifying that a set endowed with a definition of addition and scalar multiplication is a vector space immediately implies that all the theorems proven about vector spaces (of which there are many) apply. As you may have already surmised, in this post, we'll be looking at vector spaces in which the vectors are functions, i.e. function spaces.


Inner products, norms, and the distance formula


Recall also from the Euclidean space post that we defined the dot product of two vectors in ${\Bbb R}^n$ as ${\bf x} \cdot {\bf y} = \sum_{i=1}^{n}{x_i y_i}$. The dot product satisfies 3 axioms which make it a so-called inner product (in fact, the dot product inspired the definition of inner products):

Symmetry*:
$\ \ \ \ \ \ \ \ {\bf x} \cdot {\bf y} = {\bf y} \cdot {\bf x}$
Linearity in the first argument:
$\ \ \ \ \ \ \ \ (c \, {\bf x}) \cdot {\bf y} = c \, ({\bf y} \cdot {\bf x})$
$\ \ \ \ \ \ \ \ ({\bf x} + {\bf z}) \cdot {\bf y} = ({\bf x} \cdot {\bf y}) + ({\bf z} \cdot {\bf y})$
Positive-definiteness:
$\ \ \ \ \ \ \ \ {\bf x} \cdot {\bf x} \geq 0$
$\ \ \ \ \ \ \ \ {\bf x} \cdot {\bf x} = 0 \iff {\bf x} = {\bf 0}$

*Note: when dealing with vector spaces over the field of complex numbers instead of real numbers, the symmetry property is replaced with conjugate symmetry: ${\bf x} \cdot {\bf y} = \overline{{\bf y} \cdot {\bf x}}$, where the bar over the right side is complex conjugation: $\overline{a + bi} := a - bi$. We won't worry about complex vector spaces in this post.

That the dot product satisfies these properties is very easy to check. For example: $$
\begin{align}
({\bf x} + {\bf z}) \cdot {\bf y}
&= \sum_{i=1}^{n}{(x_i + z_i) y_i} \\
&= \sum_{i=1}^{n}{(x_i y_i + z_i y_i)} \\
&= \sum_{i=1}^{n}{x_i y_i} + \sum_{i=1}^{n}{z_i y_i} \\
&= ({\bf x} \cdot {\bf y}) + ({\bf z} \cdot {\bf y})
\end{align}
$$
A vector space with an inner product is called an inner product space. Inner products are often denoted $\langle {\bf x} , {\bf y} \rangle$, and I will use this notation for the remainder of this post.

If we have an inner product, we automatically get a way to specify the "size" or magnitude of a vector ${\bf x}$ by the definition $\| {\bf x} \| \buildrel \rm{def} \over{=} \sqrt{\langle {\bf x}, {\bf x} \rangle}$. This measure of magnitude satisfies 3 properties, as a direct consequence of the properties an inner product must satisfy, which make it a so-called norm:

Positive-definiteness:
$\ \ \ \ \ \ \ \ \| {\bf x} \| \geq 0$
$\ \ \ \ \ \ \ \ \|{\bf x} \| = 0 \iff {\bf x} = {\bf 0}$
Scaling:
$\ \ \ \ \ \ \ \ \| c\, {\bf x} \| = |c| \| {\bf x} \|$
Triangle inequality:
$\ \ \ \ \ \ \ \ \| {\bf x} + {\bf y}\| \leq \| {\bf x} \| + \| {\bf y} \|$

A vector space with a norm is called a normed vector space. The norm also gives us a way to measure the distance between two vectors by the definition $$
d({\bf x}, {\bf y}) \buildrel \rm{def} \over{=} \| {\bf y} - {\bf x} \|
$$ which, by the way, is automatically a metric due to the properties of norms. In the inner product space ${\Bbb R}^3$ (with the dot product as the inner product), this formula yields $$
d({\bf x}, {\bf y}) = \sqrt{(y_1 - x_1)^2 + (y_2 - x_2)^2 + (y_3 - x_3)^2}
$$ which is the well-known (Euclidean) distance formula.


The $\ell^2$ space


Let's start by considering a vector space that is a natural generalization of the familiar ${\Bbb R}^n$, the set of all $n$-dimensional "arrows", i.e. ordered $n$-tuples of real numbers ${\bf x} = (x_1, x_2, x_3, \dotsc , x_n)$. We made this set into a vector space by defining vector addition and scalar multiplication as the obvious component-wise operations. We'll now look at an infinite-dimensional analog of this space.

Consider the set of infinite sequences ${\bf x} = (x_1, x_2, x_3, \dotsc)$ of real numbers which are square-summable, i.e. $\sum_{i=1}^{\infty}{x_i^2} < \infty$. We'll see the reason for this restriction in a moment. This is a subset of the set of functions ${\Bbb R}^{\Bbb N}$ which we will call $\ell^2$ (I'll explain this notation later in the post). We can give $\ell^2$ a vector space structure using the following definitions, completely analogous to the ${\Bbb R}^n$ ones:

The zero vector is the sequence of all zeros:
${\bf 0} = (0,0,0, \dotsc)$

Vector addition is defined component-wise: 
$(x_1, x_2, x_3, \dotsc) + (y_1, y_2, y_3, \dotsc) \buildrel \rm{def} \over{=} (x_1 + y_1, x_2 + y_2, x_3 + y_3, \dotsc)$

Scalar multiplication is defined similarly: 
$c \, (x_1, x_2, x_3, \dotsc) \buildrel \rm{def} \over{=} (cx_1, cx_2, cx_3, \dotsc)$

It is routine to show that this set with these definitions satisfies the vector space axioms, and thus we can refer to these sequences as "vectors".

Furthermore, since we are only considering sequences that are square-summable, thus excluding ones like $(1,1,1, \dotsc )$, we can define the $\ell^2$ norm of a vector/sequence to be $$
\|{\bf x}\|_{2} \buildrel \rm{def} \over{=} \sqrt{\sum_{i=1}^{\infty}{x_i^2}}
$$ and know that this infinite sum converges to a finite value. Once again, this definition is very similar to the formula in ${\Bbb R}^n$. In the latter case, the norm was induced by the inner product in the sense that $\| {\bf x} \| = \sqrt{\langle {\bf x},{\bf x}\rangle}$. This is also the case for the norm in our space of sequences if we define the inner product using the formula $$
\langle {\bf x},{\bf y} \rangle \buildrel \rm{def} \over{=} \sum_{i=1}^{\infty}{x_i y_i}
$$ It is obvious that the above formula indeed defines an inner product, but there is one potential issue with this definition: we don't know a priori that the infinite sum on the right-hand side actually converges for any two square-summable sequences ${\bf x}$ and ${\bf y}$. That it does is a consequence of a version of the Cauchy-Schwarz inequality for infinite sums; at the end of the post, I will prove the more general Hölder's inequality, so for now, take my word for it that the series does converge, and thus that this inner product is well defined.


The $\ell^p$ spaces


The more general analogs of the $\ell^2$ space are the $\ell^p$ spaces, consisting of all sequences ${\bf x} = (x_1, x_2, x_3, \dotsc )$ for which $\sum_{i=1}^{\infty}{|x_i|^p} < \infty$. The larger we make $p$, the easier it is for the series to converge, as the sequence values less than 1 "drop off" more quickly. The $p$-series for ${\bf x} = (1, \tfrac{1}{2}, \tfrac{1}{3}, \tfrac{1}{4}, \dotsc )$ illustrate this concept perfectly (you may remember these from your Calc II class):



As a result, $\ell^p \subset \ell^s$ for $p<s$; put differently, $\ell^s$ contains more sequences than $\ell^p$, since the larger exponent $s$ makes it easier for the series $\sum_{i}{|x_i|^s}$ to converge.

We can define the $\ell^p$ norms by the following formula for $1 \leq p < \infty$: $$
\| {\bf x} \|_{p} \buildrel{\rm def} \over{=} \left( \sum_{i=1}^{\infty}{|x_i|^p} \right)^{1/p}
$$ When $0<p<1$, this formula fails to define a norm, because it doesn't satisfy the triangle inequality: as a counterexample, take ${\bf x} = (1,0,0,0, \dotsc )$ and ${\bf y} = (0,1,0,0, \dotsc )$. Then $$
\| {\bf x} + {\bf y} \|_{p} = \| (1,1,0,0, \dotsc ) \|_{p} = 2^{1/p} > 2 = 1 + 1 = \| {\bf x} \|_{p} + \| {\bf y} \|_{p}
$$ so $\| {\bf x} + {\bf y} \|_{p} \nleq \| {\bf x} \|_{p} + \| {\bf y} \|_{p}$ in this case. Geometrically, the counterexample illustrates that when $0 < p < 1$, the unit ball (i.e. the open ball of radius 1 centered at ${\bf 0}$) is not convex: it contains points which we can connect with a straight line, and that straight line will contain points outside the unit ball:

Unit balls in the $\ell^p$ norms (on ${\Bbb R}^2$) for various values of $p$ - from Wikipedia

For this reason, we will restrict our attention to $p \geq 1$ so that the formula above actually defines a norm (in the interest of brevity, I'll omit the proof of this fact). Finally, the diagram above shows a unit ball for $p=\infty$, which needs to be treated specially. The $\ell^{\infty}$ space is just the set of bounded sequences with the norm $$
\| {\bf x} \|_{\infty} \buildrel{\rm def} \over{=} \sup_{i}{|x_i|}
$$ where this supremum (a synonym for "least upper bound") is guaranteed to exist since the sequences in the space are bounded by definition. Thus in the diagram, we see that the $\ell^{\infty}$ norm in the plane has a unit ball consisting of points for which the coordinate values have a least upper bound of 1. In other words, in two dimensions, the $\ell^{\infty}$ unit ball consists of points whose $x$- and $y$-coordinates are both less than or equal to 1, i.e. a square.

Note: We've actually already seen the metric induced by the $\ell^{\infty}$ norm, $\| {\bf x} - {\bf y}\|_{\infty}$, as the uniform distance. Thus convergence of a sequence of functions in the $\ell^{\infty}$ norm is the same as uniform convergence.

Now, you were probably wondering whether, like the $\ell^2$ norm, the other $\ell^p$ norms arise from inner products. Unfortunately, they do not.

Proposition: The $\ell^p$ norm is induced by an inner product if and only if $p=2$.

Proof: We already showed that the $\ell^2$ norm does arise from an inner product. To prove that $p=2$ is the only value for which this is true, note that if $\| \cdot \|$ is any norm arising from an inner product, then for any vectors ${\bf x}, {\bf y}$, we have $$
\begin{align}
\| {\bf x} + {\bf y} \|^2 &= \langle {\bf x} + {\bf y}, {\bf x} + {\bf y} \rangle
= \langle {\bf x}, {\bf x} \rangle
+ \langle {\bf x}, {\bf y} \rangle
+ \langle {\bf y}, {\bf x} \rangle
+\langle {\bf y}, {\bf y} \rangle \\

\| {\bf x} - {\bf y} \|^2 &= \langle {\bf x} - {\bf y}, {\bf x} - {\bf y} \rangle
= \langle {\bf x}, {\bf x} \rangle
- \langle {\bf x}, {\bf y} \rangle
- \langle {\bf y}, {\bf x} \rangle
+\langle {\bf y}, {\bf y} \rangle
\end{align}
$$ Thus, $$
\| {\bf x} + {\bf y} \|^2 + \| {\bf x} - {\bf y} \|^2
= 2 \langle {\bf x}, {\bf x} \rangle
+ 2 \langle {\bf y}, {\bf y} \rangle
= 2 \| {\bf x} \|^2 + 2 \| {\bf y} \|^2
$$ This is known as the parallelogram law, a generalization of the Pythagorean theorem for right triangles.

Vectors involved in the parallelogram law - from Wikipedia

Applying this to the $\ell^p$ norm with ${\bf x} = (1,0,0,0, \dotsc )$ and ${\bf y} = (0,1,0,0, \dotsc )$, we obtain $$
\begin{align}
&&2^{2/p} + 2^{2/p} &= 2 \cdot 1^{2/p} + 2 \cdot 1^{2/p} \\
&\implies &2 \cdot 2^{2/p} &= 2+2 \\
&\implies &2^{(2/p+1)} &= 4 = 2^2 \\
&\implies &2/p+1 &=2 \\
&\implies &p &=2
\end{align}
$$ $\square$

So $\ell^2$ is the only inner product space of the $\ell^p$ family.


The $L^p$ spaces


The $\ell^p$ spaces are all subsets of ${\Bbb R}^{\Bbb N}$, the space of real-valued sequences. Naturally, these spaces have uncountably infinite analogs which are subsets of ${\Bbb R}^{\Bbb R}$, the space of real-valued functions taking inputs along the entire number line (instead of just $1,2,3,\dotsc$).

For $1 \leq p < \infty$, the $L^p$ space is defined as the set of functions $f: {\Bbb R} \rightarrow {\Bbb R}$ for which $$
\int_{-\infty}^{\infty}{|f(x)|^p \, dx} < \infty
$$ with the norm $$
\| f \|_{p} \buildrel{\rm def} \over{=} \left( \int_{-\infty}^{\infty}{|f(x)|^p \, dx} \right)^{1/p}
$$ The $L^{\infty}$ space is also defined analogously to $\ell^{\infty}$, but with the supremum replaced by the essential supremum (see below). Finally, like the discrete case, the only $L^p$ norm which is induced by an inner product is the $L^2$ norm, with the inner product $$
\langle f,g \rangle \buildrel{\rm def} \over{=} \int_{-\infty}^{\infty}{f(x)g(x) \, dx}
$$ So basically, the $L^p$ spaces are the same as the $\ell^p$ spaces, but with sequences replaced by functions of the entire number line and, accordingly, sums replaced by integrals. However, there are a number of complications which arise when we move from discrete to continuous inputs, namely:

  • The integral is understood to be a Lebesgue integral instead of the usual Riemann integral from calculus. The two agree whenever they are both defined, but the Lebesgue integral is defined for many more functions than the Riemann integral. A rigorous definition requires measure theory, which tells us how to define the measure of a given set of input values. The Lebesgue measure on the real number line is designed such that the measure of an interval $(a,b)$ is $b-a$.
  • The integral is not affected by changes in the function value on a set of measure zero. Any finite set of points has Lebesgue measure zero. Furthermore, any countably infinite set of points, such as the set of all rational numbers, also has measure zero.
  • Because of the last bullet, the members of the $L^p$ spaces are technically not functions, but rather equivalence classes of functions, where the equivalence relation is $$f \sim g \iff f=g \ \ {\rm a.e.}$$ where "a.e." (almost everywhere) means everywhere, except possibly on a set of measure zero.
  • The $L^{\infty}$ norm of a function (technically, an equivalence class of functions) $f$ is defined as the essential supremum of $f$. The essential supremum is the supremum, or least upper bound, of $f$, except possibly on a set of measure zero. For example, if $f(x) = 0$ for $x \neq 5$ and $f(5)=1$, then the supremum of $f$ is $1$, but the essential supremum of $f$ is $0$ since $f \leq 0$ except on the set $\{ 5 \}$, which is a single point and thus has measure zero.
Given the complexity of going into measures, the construction of the Lebesgue integral, and various Lebesgue integral convergence theorems, I won't delve further into the $L^p$ spaces in this post.

The $\ell^p$ spaces are enough to illustrate that function spaces (sequence spaces are a form of function spaces, just with a discrete set of input values) can possess a vector space structure as well as a norm and, for $p=2$, an inner product. Thinking of functions as members of a normed vector space is subtle, but as mentioned at the beginning of the post, it provides the mathematical foundation for numerous applications, some of which I hope to explore in future posts.

Part 2 of this post will explore linear functionals and duality, once again focusing on the $\ell^p$ spaces as a representative example.

I will conclude this post with the deferred proof of Hölder's inequality, but first, we'll need a lemma known as Young's inequality. For both of the proofs below, let $p$ and $q$ be positive real numbers satisfying $\frac{1}{p}+\frac{1}{q}=1$, known as Hölder conjugates.

Lemma (Young's inequality for products): For any two non-negative real numbers $\alpha$ and $\beta$, we have $$
\alpha \beta \leq \frac{\alpha ^ p}{p} + \frac{\beta ^ q}{q}
$$ with equality if and only if $\beta = \alpha^{p-1}$.

Proof: Note that $$
\begin{align}
&  &\frac{1}{p}+\frac{1}{q}&=1 \\[2mm]
&\implies &q+p &= pq \\[2mm]
&\implies &q+p + (1-p-q) &= pq+(1-p-q) \\[2mm]
&\implies &1 &= (p-1)(q-1) \\[2mm]
&\implies &\frac{1}{p-1} &= q-1
\end{align}
$$ so that for some numbers $t$ and $u$, we have $u=t^{p-1} \iff t=u^{q-1}$. In other words $u(t)=t^{p-1}$ and $t(u)=u^{q-1}$ are inverse functions.

Let $\alpha, \beta  \geq 0$. If either one is zero, the inequality is trivially true, so assume they are both positive. Then we have $$
\alpha \beta \leq \color{blue}{\int_{0}^{\alpha}{t^{p-1} \, dt}} + \color{red}{\int_{0}^{\beta}{u^{q-1} \, du}}
= \frac{\alpha^p}{p} + \frac{\beta^q}{q}
$$ The inequality follows from the fact that $\alpha \beta$ is the area of the rectangle in the figure below, whose top edge is at $u=\beta$, below $u(\alpha) = \alpha^{p-1}$ (since $u(t)$ is an increasing function). When $\beta=\alpha^{p-1}$, the inequality is an equality, since the rectangle's top edge would coincide with the upper tip of the blue region.


Note that the inequality is still true if $\beta > \alpha^{p-1}$, since in that case, $\alpha < \beta^{q-1}$; since $t(u)=u^{q-1}$ is also an increasing function, this would result in extra area of the red zone sticking out to the right of the $\alpha \beta$-rectangle.
$\square$

Hölder's inequality: Let ${\bf x} \in \ell^p$ and ${\bf y} \in \ell^q$. Then $$
\sum_{i=1}^{\infty}{|x_i y_i|} \leq
\left( \sum_{i=1}^{\infty}{|x_i|^p} \right)^{1/p} \left( \sum_{i=1}^{\infty}{|y_i|^q} \right)^{1/q}
$$ Before giving the proof, I want to point out that

  1. If $p=q=2$ (this choice of $p$ and $q$ does satisfy the assumption $\tfrac{1}{p}+\tfrac{1}{q}=1$), then Hölder's inequality is just the infinite-sum version of the Cauchy-Schwarz inequality.
  2. The statement of Hölder's inequality can also be written in terms of the $p$-norms: $$\| {\bf z} \|_1 \leq \| {\bf x} \|_{p} \| {\bf y} \|_{q} $$ where ${\bf z}$ is the sequence whose $i$-th component is $z_i = x_i y_i$. So the inequality also implies that ${\bf z} \in \ell^1$.
Proof of Hölder's inequality: If either ${\bf x}$ or ${\bf y}$ is the zero vector, then the inequality is trivially true, so assume both are non-zero. Then $\|{\bf x}\|_{p}$ and $\|{\bf y}\|_{q}$ are non-zero, and so we can define the unit vectors ${\bf u} = \frac{1}{\|{\bf x}\|_p} {\bf x}$ and ${\bf v} = \frac{1}{\|{\bf y}\|_q} {\bf y}$. Then, by Young's inequality, $$
|u_i v_i| \leq \frac{|u_i|^p}{p} + \frac{|v_i|^q}{q} \tag{$\star$}
$$ for all $i \in {\Bbb N}$.

Since ${\bf x}$ (and thus ${\bf u}$) is in $\ell^p$, and similarly, ${\bf v} \in \ell^q$, the series $\sum_{i=1}^{\infty}{|u_i|^p}$ and $\sum_{i=1}^{\infty}{|v_i|^q}$ both converge. Using the comparison test and $( \star )$, we can conclude that the series $\sum_{i=1}^{\infty}{|u_i v_i|}$ also converges, and thus the sequence ${\bf w} = (u_i v_i)_{i \in {\Bbb N}}$ is in $\ell^1$.

Since ${\bf z} = \|{\bf x}\|_{p} \|{\bf y}\|_{q} {\bf w}$ (i.e. ${\bf z}$ is a scalar multiple of ${\bf w}$), ${\bf z} \in \ell^1$ as well.

Finally, by summing both sides of $( \star )$ from $i=1$ to $\infty$ and using the fact that ${\bf u}$ and ${\bf v}$ are unit vectors, we obtain$$
\|{\bf w}\|_{1} \leq \frac{1}{p}\|{\bf u}\|_{p} + \frac{1}{q}\|{\bf v}\|_{q} = \frac{1}{p}+\frac{1}{q} = 1
$$ and thus $$
\|{\bf z}\|_{1} = \|{\bf x}\|_{p} \|{\bf y}\|_{q} \|{\bf w}\|_{1} \leq \|{\bf x}\|_{p} \|{\bf y}\|_{q}
$$ $\square$

There are also (Lebesgue) integral versions of Young's inequality and Hölder's inequality which are used for the $L^p$ spaces, but the discrete versions essentially give us the same picture without requiring the machinery of measure theory.

I hope this post was helpful, and stay tuned for Part 2.

Convergence of Sequences of Functions

Preliminaries: Sets of Functions, How close is "close enough"?

The first post above introduced sets whose elements are functions and the associated naming convention; the second introduced open and closed sets in metric and topological spaces and the definitions of convergence appropriate to each. If you aren't familiar with these topics, then make sure to take a look at the preliminary posts before proceeding.

In this post, we'll investigate the different ways in which a sequence of functions can converge to a limit function. We'll see that different types of convergence correspond to different topologies on the relevant function space.

This is a long post- the first part is more intuitive, while the second part, from and including the Metrizability section, is proof-heavy. However, all proofs in this post require only the information in the preliminary posts and this post. As always, feel free to post any questions in the comments section if anything is unclear.


The Box Topology


Recall from the preliminary posts that the open intervals are a base for the usual topology on ${\Bbb R}$ (i.e. open sets are unions of open intervals) and that the function space ${\Bbb R}^{\Bbb N}$ can be viewed as an infinite Cartesian product of ${\Bbb R}$. In other words, we can think of ${\Bbb R}^{\Bbb N}$ as one copy of the number line (possible function values) for each number $0,1,2,\dotsc$ (the function inputs). ${\Bbb R}^{\Bbb R}$ is analogous but is an uncountably infinite product of the number line.

For now, let's focus on ${\Bbb R}^{\Bbb N}$ as the notation is a bit simpler. To create a topology on this Cartesian product, perhaps the most obvious idea would be to define open sets as Cartesian products of open sets in the real number line or, equivalently, Cartesian products of the basic open sets, the open intervals. This is the box topology (we'll see below why this is not good enough to warrant the name product topology).

Definition of the Box Topology: Let $U_1, U_2, U_3, \dotsc$ be open sets in the real number line ${\Bbb R}$. The set of functions $f : {\Bbb N} \rightarrow {\Bbb R}$ which satisfy $f(x) \in U_x$ for each $x=1,2,3, \dotsc$ is called an open box in ${\Bbb R}^{\Bbb N}$. The topology in which open sets are unions of open boxes (i.e. the topology generated by the open boxes) is called the box topology.
$\square$

More concretely, let $U_1 = (a_1, b_1), U_2 = (a_2, b_2), U_3 = (a_3, b_3), \dotsc$ be open intervals in ${\Bbb R}$. Then the set of functions $$
\begin{align}
&\{ f: {\Bbb N} \rightarrow {\Bbb R} \ | \ (f(1), f(2), f(3), \dotsc ) \in U_1 \times U_2 \times U_3 \times \dotsb \} \\

= \ &\{ f: {\Bbb N} \rightarrow {\Bbb R} \ | \ f(1) \in U_1 \ \ {\rm and} \ \ f(2) \in U_2 \ \ {\rm and} \ \ f(3) \in U_3 \ \ {\rm and} \ \dotsb \}
\end{align}
$$ is an open box in ${\Bbb R}^{\Bbb N}$ and thus an open set in the box topology.

Though the box topology is the most obvious one, it turns out to be basically useless because it is too restrictive: it allows us to restrict all function values at the same time to ranges of different sizes. For example, in any "reasonable" topology, the following sequence of functions $(\phi_n)$ should converge to the zero function: $$
\begin{align}
\phi_1 &= \left( \, 1 \, , 1 \, , 1 \, , \dotsc \right) \\
\phi_2 &= \left( \tfrac{1}{2},\tfrac{1}{2}, \tfrac{1}{2}, \dotsc \right) \\
\phi_3 &= \left( \tfrac{1}{3}, \tfrac{1}{3}, \tfrac{1}{3}, \dotsc \right) \\
& \ \ \vdots
\end{align}
$$ However, this sequence does not converge in the box topology: the open box $B = (-1,1) \times (-\frac{1}{2}, \frac{1}{2}) \times (-\frac{1}{3}, \frac{1}{3}) \times \dotsb$ contains the zero function ${\bf 0} = (0,0,0, \dotsc )$ and so is a neighborhood of ${\bf 0}$, but $B$ does not contain any of the $\phi_n$'s.So $\phi_n \not \rightarrow {\bf 0}$ in the box topology.


Pointwise Convergence: The Product Topology


So the box topology is too restrictive to correspond to any reasonable type of convergence, but a slight modification fixes the issue presented above. If we only allow open sets to constrain function values at a finite number of input values, then we obtain the much more useful product topology.

Definition of the Product Topology: Let $U_1, U_2, U_3, \dotsc$ be open sets in the real number line ${\Bbb R}$ with the restriction that only finitely many of the $U_i$'s are not all of ${\Bbb R}$. The set of functions $f : {\Bbb N} \rightarrow {\Bbb R}$ which satisfy $f(x) \in U_x$ for each $x=1,2,3, \dotsc$ are the open sets in the product topology.
$\square$

The finitely many values of $i$ for which $U_i \neq {\Bbb R}$ are the finitely many values for which a particular open set restricts the member functions' values $f(i)$ to lie in $U_i$. For example, a typical open set in the product topology on ${\Bbb R}^{\Bbb N}$ looks like $$
S = \left\{ f: {\Bbb N} \rightarrow {\Bbb R} \ | \ f(2) \in (3,7) \ {\rm and} \ f(103) \in \left(-56, \tfrac{1}{5} \right) \ {\rm and} \ f(200) \in (0,1) \right\}
$$ Note that there is no "..." at the end of the set definition (as there was in the example shown for the box topology), as $S$ only restricts function values for the 3 input values.

The functions which converge under the product topology are exactly those which converge pointwise.

Definition of Pointwise Convergence: Let $(f_n)$ be a sequence of functions in ${\Bbb R}^{\Bbb N}$. We say $(f_n)$ converges pointwise to a function $f$ if, for each individual $x \in {\Bbb N}$, the sequence of real numbers $f_{1}(x), f_{2}(x), f_{3}(x), \dotsc$ converges to $f(x)$ (in the usual topology on ${\Bbb R}$).
$\square$

The definition for ${\Bbb R}^{\Bbb R}$ is analogous with ${\Bbb N}$ replaced by ${\Bbb R}$.

As an example, the following sequence of functions $(f_n)$ converges pointwise to the zero function: $$
\begin{align}
f_1 &= \left( \, 1 \, , 2 \, , 3 \, , \color{red}{4} \, , 5 \, , \dotsc \right) \\

f_2 &= \left( \tfrac{1}{2},\tfrac{2}{2}, \tfrac{3}{2}, \color{red}{\tfrac{4}{2}}, \tfrac{5}{2}, \dotsc \right) \\

f_3 &= \left( \tfrac{1}{3}, \tfrac{2}{3}, \tfrac{3}{3}, \color{red}{\tfrac{4}{3}}, \tfrac{5}{3}, \dotsc \right) \\

& \ \ \vdots
\end{align}
$$ since for each fixed $x$, such as the red column corresponding to $x=4$, the sequence $f_{n}(x)$ is $x, \tfrac{x}{2}, \tfrac{x}{3}, \dotsc$, which clearly converges to $0$.

For a less straightforward example, this time in ${\Bbb R}^{[0, 2 \pi]}$, consider the sequence of functions $(f_n)$ defined by $f_{n}(x) = n \sin \left( \frac{x}{n} \right)$ for $x \in [0, 2 \pi ]$. This sequence converges pointwise to $f(x) = x$, as we can see from the following diagram of the first 20 functions in the sequence, along with the graph of $f(x)=x$:



To formally prove that $\displaystyle{\lim_{n \rightarrow \infty}{n \sin \left( \tfrac{x}{n} \right)} = x}$ for $x \in [0, 2 \pi ]$, let $x$ be fixed, set $t = x / n$, and then use the fact that $\displaystyle{\lim_{t \rightarrow 0}{\frac{\sin (t)}{t}} = 1}$. In this second example, the sequence $f_{n}(x)$ converges faster to $f(x)$ closer to $x=0$ than it does closer to $x = 2 \pi$ (you can see in the diagram that the sequence takes longer to "close the gap" on the right-hand side); this is fine for pointwise convergence.

Now, for the main event of this section- normally, I would leave this for the end of the post, but since the whole point of this post is to connect the different notions of convergence with their respective topologies, this one is worth reading even for those who aren't normally into proofs:

Convergence in the product topology is the same as pointwise convergence: Let $(f_n)$ be a sequence of functions in ${\Bbb R}^{\Bbb N}$ and let $f$ be another function (in the same space). Then $f_n \rightarrow f$ in the product topology if and only if $f_n \rightarrow f$ pointwise.

Proof: Suppose $f_n \rightarrow f$ in the product topology. Then, for every open set (in the product topology) $U$ containing $f$, $f_n \in U$ for all but finitely many (say the first $N$) values of $n$. This is simply the definition of convergence in a topological space, as discussed in the last post. Consider a fixed value $x \in {\Bbb N}$, and let $S$ be an open set containing $f(x)$ in ${\Bbb R}$ (for example, $S = (f(x) - \epsilon, f(x) + \epsilon )$ for some $\epsilon >0$ will work). Then the set of functions $$
U = \{ g: {\Bbb N} \rightarrow {\Bbb R} \ | \ g(x) \in S \}
$$ is an open set in the product topology which contains $f$. Thus, $f_n \in U$ for all $n > N$, which means that $f_{n}(x) \in S$ for all $n > N$. Since $S$ was an arbitrary neighborhood of $f(x)$ in ${\Bbb R}$, this shows that $\displaystyle{\lim_{n \rightarrow \infty}{f_{n}(x)} = f(x)}$, so $f_n \rightarrow f$ pointwise (in the usual topology on ${\Bbb R})$.

Conversely, suppose $f_n \rightarrow f$ pointwise. Then for all $x$, and for any open set $S$ in ${\Bbb R}$ which contains $f(x)$, $f_{n}(x) \in S$ for all $n > N$, for some finite $N$. Now, a general open set in the product topology containing $f$ is of the form $$
U = \{ g: {\Bbb N} \rightarrow {\Bbb R} \ | \ g(x_1) \in S_1 \ {\rm and} \ g(x_2) \in S_2 \ {\rm and} \ \cdots \ {\rm and} \ g(x_k) \in S_k\}
$$ where $x_1, x_2, \dotsc , x_k$ are the finitely many function values constrained by $U$, and each $S_i$ is a neighborhood (in ${\Bbb R}$) of $f(x_i)$. So by the argument above, $f_{n}(x_i) \in S_i$ for all $n > N_i$, for each of the $k$ values of $i$. Therefore, $f_n \in U$ for all $n > N_{\rm max} = \max \{ N_1, N_2, \dotsc , N_k \}$. This proves that $f_n \rightarrow f$ in the product topology.
$\square$

The fact that an open set in the product topology can only constrain finitely many function values was key in the proof, because it allowed us to take the maximum of the $N_i$'s at the end. In the box topology, we could have an example like the sequence $(\phi_n)$ shown above, where $N_1 = 1, N_2 = 2, N_3 = 3, \dotsc \ $, and so since the $N_i$'s had no maximum or even any upper bound.

Since this proof didn't rely on anything in particular about ${\Bbb N}$ or ${\Bbb R}$, the same proof holds for the product topology on any function space $X^Y$. Since convergence in the product topology is equivalent to pointwise convergence, mathematicians also refer to the product topology as the topology of pointwise convergence.


Some problems with pointwise convergence...


The examples shown above of sequences of functions which converge pointwise to a limit function all truly fit with the common-sense notion of convergence of functions. Though the sequences converged faster at some input values than others, they all got there eventually at every input value. Now I'm going to show you some nastier examples which highlight the pitfalls with pointwise convergence.

Runaway Averages: Consider the following sequence of functions ${\Bbb N} \rightarrow {\Bbb R}$: $$
\begin{align}
f_1 &= \left( 1,1,1,1,1,1, \dotsc \right) \\
f_2 &= \left( 0,2,2,2,2,2, \dotsc \right) \\
f_3 &= \left( 0,0,3,3,3,3, \dotsc \right) \\
f_4 &= \left( 0,0,0,4,4,4, \dotsc \right) \\
& \ \ \vdots
\end{align}
$$ This sequence converges pointwise to the zero function since for each $x$, $f_{n}(x) = 0$ for all $n > x$. But for every $n$, at the (infinitely many) values of $x$ for which $f_{n}(x) \neq 0$, the functions actually get further away from zero, not closer to it as we'd expect of a "convergent" sequence. The average value of $f_n$ diverges to infinity as $n$ increases.
$\square$

Tricky Triangles: The sequence of functions defined by $$
f_{n}(x) =
\cases{
n^{2}x & \text{if} \ \ \ \ 0 \leq x \leq 1/n \\
2n - n^{2}x & \text{if} \ \ \ \ 1/n < x \leq 2/n \\
0 & \text{if} \ \ \ \ x > 2/n
}
$$ yields triangles of height $n$ and constant area $1$, anchored at the point $(0,0)$, as shown in the below diagram of the first 3 functions:



It's clear from the picture that this also converges pointwise to the zero function, and the area of the triangles even stays constant, but the increasingly tall spikes are not consistent with our intuitive notion of "convergence."
$\square$

We need something more restrictive than pointwise convergence to avoid these troublesome cases, but also something not as constricting as the box topology.


Uniform Convergence


In a function space $Y^X$ of functions $X \rightarrow Y$, if $Y$ is a metric space (like ${\Bbb R}$ for example), then we can define a measure of the "maximum" distance between two functions.

If we have two functions $f,g \in Y^X$, and a metric $d$ on $Y$, then we define the uniform distance between $f$ and $g$, denoted $\rho (f,g)$, to be the least upper bound (or supremum) of the distances between $f(x)$ and $g(x)$ at all values of $x$ in the domain $X$. Formally, $$
\rho (f,g) = \sup_{x \in X} \{ d(f(x),g(x)) \}
$$ This should be thought of as the maximum distance between the two functions, but we need to use the least upper bound instead of maximum in case one function has an asymptote and thus never achieves the "maximum" distance. If there is no upper bound on these distances, then $\rho (f,g) = \infty$.

As an example, the below diagram shows the function $f(x) = \cos (3x)^{2} + 4$ on the interval $x \in [0,1]$ and some random function $g(x)$ which has a uniform distance of less than $\epsilon = 1/2$ to $f(x)$.



Uniform distance allows us to define a notion of convergence which fits our intuitive notion while avoiding the pitfalls of pointwise convergence.

Definition of uniform convergence: Let $(f_n)$ be a sequence of functions in ${\Bbb R}^X$ for some domain $X$. We say $(f_{n})$ converges uniformly to a function $f$ if $\rho (f_{n} , f) \rightarrow 0$ as $n \rightarrow \infty$. In other words, for any $\epsilon > 0$, there exists an $N$ such that $\rho( f_{n} , f) < \epsilon$ for all $n>N$.
$\square$

A sequence $(f_n)$ converges uniformly to $f$ if, for large enough values of $n$, the graphs of the functions $f_n$ fit within arbitrarily small "ranges" (between the red and green lines in the above diagram) around the graph of $f$. The sequence $(\phi_n)$ above converges uniformly to the zero function, but the other examples above which converge pointwise to $\bf 0$ above do not converge uniformly, since they have growing protrusions as $n$ increases.

Like pointwise convergence, uniform convergence is equivalent to convergence in particular topology called the uniform topology.

Definition of the Uniform Topology: Let $f \in {\Bbb R}^X$ for some domain set $X$, and let $\epsilon > 0$. Then the sets $$
B_{\rho}(f, \epsilon) = \left\{ g \in {\Bbb R}^X \ | \ \rho(f,g)<\epsilon \right\}
$$ are a base for the uniform topology, i.e. open sets are unions of sets of the form $B_{\rho}(f,\epsilon)$.
$\square$

From this definition, it is immediately clear that convergence in the uniform topology is the same as uniform convergence as defined above.

One important note on the uniform topology: an open box such as $B_1 = (-1,1) \times (-1,1) \times \dotsb$ in ${\Bbb R}^{\Bbb N}$ is not the same as $B_{\rho}({\bf 0}, 1)$. The reason is that a function such as $g = (\tfrac{1}{2}, \tfrac{2}{3}, \tfrac{3}{4}, \tfrac{4}{5}, \dotsc)$ approaches $1$ asymptotically, so its supremum (least upper bound) is $1$, and thus $\rho({\bf 0}, g) = 1 \nless 1$, which means $g \notin B_{\rho}({\bf 0}, 1)$. But all of $g$'s values are less than $1$ (i.e. it never actually achieves its least upper bound), so $g \in B_1$.

It can actually be shown without too much difficulty that open boxes like $B_1$ are not even open sets in the uniform topology. Furthermore, although $B_{\rho}({\bf 0}, 1) \neq B_1$, we do have $B_{\rho}({\bf 0}, 1) = \bigcup_{\delta < 1}{B_{\delta}}$, where $B_{\delta} = (-\delta, \delta) \times (-\delta, \delta) \times \dotsb$.


Metrizability


It wouldn't be unreasonable to think that the uniform topology is the metric topology of the metric $\rho$. Unfortunately, $\rho$ isn't even a metric, and that's because it can take the value $\infty$. However, we can solve this issue rather easily by defining the bounded uniform metric $$
\bar{\rho}(f,g) = \min \{ \rho(f,g),1 \}
$$ This is a legitimate metric (it isn't too hard to verify that it satisfies the 3 axioms) since capping $\rho$ at $1$ solves the infinity issue. As in the case of Euclidean space (refer to the preliminary post), the open sets defined via this metric (i.e. the metric topology) are those in which each point contains an open ball $B_{\bar{\rho}}(f,\epsilon)$ around each of it's points $f$. Now, any set containing some open ball around each of its points also contains many open balls of smaller radii around the same points. This means that the cap of $1$ on the value of the metric has no effect on which sets are open, and by extension, on which sequences converge. In other words, the metric topology of $\bar{\rho}$ is the same as the uniform topology (i.e. the one generated by "open balls" in the "almost metric" $\rho$).

Note: It is actually always the case (by this same logic) that the metric topology of a metric $d$ is the same as that of $\bar{d} = \min \{ d,1 \}$. Although our $\rho$ wasn't even a metric to begin with, the logic is still the same.

Even without knowing the above, we can conclude directly based on the definition of $\bar{\rho}$ via $\rho$ that convergence in this metric space is the same as convergence in the uniform topology, since for convergence, we only care about (arbitrarily) small distances, so capping the metric at 1 has no effect on which sequences converge.

The metrizability of the product/pointwise topology and box topology is less obvious. Let's investigate, focusing on ${\Bbb R}^{\Bbb N}$ and ${\Bbb R}^{\Bbb R}$.

As you may have expected, the troublesome box topology is too restrictive to be metrizable. The product topology's metrizability is a bit trickier and depends on whether the Cartesian product in question is countable or not. The (non-unique) metrics which yield the product and uniform topologies are shown in the table below:


The below proofs justify the results in the above table (the uniform topology was already covered above, albeit only informally).

Proof that the box topology on ${\Bbb R}^{\Bbb N}$ is not metrizable: Suppose that $d$ is a metric on ${\Bbb R}^{\Bbb N}$ whose metric topology is the same as the box topology.

Consider the open balls $B_{d}({\bf 0}, 1), B_{d}({\bf 0}, \tfrac{1}{2}), B_{d}({\bf 0}, \tfrac{1}{3}), \dotsc$ around the zero function ${\bf 0} = (0,0,0,\dotsc)$ in ${\Bbb R}^{\Bbb N}$. By assumption, each of these is open in the box topology, which means that each one contains an open box around ${\bf 0}$. So there exist positive numbers $a_{ij}$ such that $$
\begin{align}
B_{d}({\bf 0}, \, 1) &\supsetneq (-a_{11}, a_{11}) \times (-a_{12},a_{12}) \times (-a_{13},a_{13}) \times \dotsb \tag{1}\\

B_{d}({\bf 0}, \tfrac{1}{2}) &\supsetneq (-a_{21}, a_{21}) \times (-a_{22},a_{22}) \times (-a_{23},a_{23}) \times \dotsb \tag{2}\\

B_{d}({\bf 0}, \tfrac{1}{3}) &\supsetneq (-a_{31}, a_{31}) \times (-a_{32},a_{32}) \times (-a_{33},a_{33}) \times \dotsb \tag{3}\\

& \ \ \vdots
\end{align}
$$ We can assume the intervals along each column are shrinking, i.e. $a_{1j} \geq a_{2j} \geq a_{3j} \geq \dotsb \ ( \star )$. This is without loss of generality, since any interval contains many smaller ones as well. Consider the open box $B$ formed by the intervals on the diagonal: $$
B = (-a_{11},a_{11}) \times (-a_{22},a_{22}) \times (-a_{33},a_{33}) \times \dotsb
$$ This is a neighborhood of ${\bf 0}$ in the box topology and thus, by assumption, is an open set in the metric topology. So $B$ must contain an open ball $B_{d}({\bf 0},r)$ centered at ${\bf 0}$ for some radius $r$. In particular, $B$ must contain one of the $B_{d}({\bf 0}, \tfrac{1}{n})$ for some $n$ (large enough that $1/n < r$). However, this is a contradiction, for if $B$ contained $B_{d}({\bf 0}, 1)$, then each sequence (i.e. element of ${\Bbb R}^{\Bbb N}$) in the ball would be in $B$. In particular, for each sequence $x_1, x_2, x_3, \dotsc \in B_{d}({\bf 0},1)$, we would have $$
\begin{align}
x_1 &\in (-a_{11},a_{11}) \\
x_2 &\in (-a_{22},a_{22}) \buildrel{( \star )} \over{\subseteq} (-a_{12},a_{12}) \\
x_3 &\in (-a_{33},a_{33})  \buildrel{( \star )} \over{\subseteq} (-a_{23},a_{23}) \buildrel{( \star )} \over{\subseteq} (-a_{13},a_{13}) \\
& \ \ \vdots
\end{align}
$$ But the $\supsetneq$ in $(1)$ above tells us that at least one sequence $(x_n)$ exists for which this is not the case. The same logic precludes the rest of the open balls $B_{d}({\bf 0}, \tfrac{1}{n})$ from being fully contained in $B$, so we conclude that no such metric $d$ can exist.
$\square$

Proof that ${\Bbb R}^{\Bbb N}$ with the product topology is metrizable: Let $f$ and $g$ be sequences in ${\Bbb R}^{\Bbb N}$. Let $d(x,y) = |y-x|$ be the usual metric on the real line and $\bar{d}(x,y) = \min \{ d(x,y), 1 \}$. Define the metric $D$ on ${\Bbb R}^{\Bbb N}$ by $$
D(f,g) = \sup_{n \in {\Bbb N}} \left\{ \frac{\bar{d}(f(n),g(n))}{n} \right\}
$$ $D$ is indeed a metric, which I won't prove here. We will show that open sets in the metric topology of $D$, which we'll call $\mathscr{T}_D$, are open sets in the product topology $\mathscr{T}$, and vice versa.

$\mathscr{T}_D \subseteq \mathscr{T}$:
Let $U$ be an open set in the metric topology $\mathscr{T}_D$, and let $f = (x_1. x_2, x_3, \dotsc ) \in U$. By the definition of open sets in the metric topology, there exists an $\epsilon > 0$ such that $B_{D}(f, \epsilon) \subseteq U$. Choose an $N \in {\Bbb N}$ large enough that $1/N < \epsilon$, and let $g = (y_1, y_2, y_3, \dotsc )$ be another function in ${\Bbb R}^{\Bbb N}$. Since $\bar{d}$ is capped at $1$, we have $\frac{\bar{d}(x_i , y_i)}{i} < \frac{1}{N}$ for all $i>N$. In other words, $\frac{1}{N}$ is an upper bound on $\frac{\bar{d}(x_i , y_i)}{i}$ for $i>N$.

$D(f,g)$ is the least upper bound of the numbers $\frac{\bar{d}(x_i, y_i)}{i}$, so by the above, we have $$
D(f,g) \leq \max \left\{ \bar{d}(x_1, y_1), \frac{\bar{d}(x_2, y_2)}{2}, \frac{\bar{d}(x_3, y_3)}{3}, \dotsc , \frac{1}{N} \right\} \tag{$\dagger$}
$$ Define the set $$
V = (x_1-\epsilon, x_1+\epsilon) \times (x_2-\epsilon, x_2+\epsilon) \times \dotsb \times (x_N-\epsilon, x_N+\epsilon) \times {\Bbb R} \times {\Bbb R} \times \dotsb
$$ which is clearly an open neighborhood of $f$ in the product topology $\mathscr{T}$. Let $g \in V$. By the definition of $V$, $|y_1-x_1| < \epsilon, |y_2-x_2| < \epsilon, \dotsc , |y_N-x_N|<\epsilon$. Thus, by $( \dagger )$ and the fact that $\frac{1}{N}<\epsilon$, we have $D(f,g) < \epsilon$, i.e. $g \in B_{D}(f, \epsilon) \subseteq U$. This proves that $V \subseteq U$, so $U$ contains an open neighborhood (in $\mathscr{T}$) of each of its points $x$ and is thus an open set in $\mathscr{T}$.

$\mathscr{T} \subseteq \mathscr{T}_D$:
Let $U$ be an open set in the product topology $\mathscr{T}$, so $U = \prod_{n \in {\Bbb N}}{U_i}$, where only finitely many of the open sets (in the usual topology on ${\Bbb R}$) $U_i$ are not all of ${\Bbb R}$. Define $J$ as the finite set of indices $j$ for which $U_j \neq {\Bbb R}$. Let $f = (x_1, x_2, x_3, \dotsc ) \in U$. Since each $U_j$ is an open set in ${\Bbb R}$, there exists an $\epsilon_j>0$ such that $(x_j - \epsilon_j, x_j + \epsilon_j) \subseteq U_j$ for each $j \in J$. Set $\epsilon = \frac{1}{2} \cdot \min_{j \in J} \left\{ \frac{\epsilon_j}{j} \right\}$. Furthermore, we can assume without loss of generality that each $\epsilon_j < 1$ so that we won't need to worry about the cap of $1$ on $\bar{d}$ below.

We will prove that $B_{D}(f, \epsilon)$, an open neighborhood of $f$ in the metric topology $\mathscr{T}_D$, is contained in $U$, which implies that $U$ is open in $\mathscr{T}_D$. To that end, let $g = (y_1, y_2, y_3, \dotsc ) \in B_{D}(f, \epsilon)$. Then, for each $j \in J$, we have $$
\begin{align}
& &&\frac{\bar{d}(x_j, y_j)}{j} &&< \ \ \epsilon \tag{def. of $B_{D}(f, \epsilon)$}\\[2mm]
&\implies &&\frac{\bar{d}(x_j, y_j)}{j} &&< \ \ \frac{\epsilon_j}{j} \tag{def. of $\epsilon$} \\[2mm]
&\implies &&\bar{d}(x_j, y_j) &&< \ \ \epsilon_j \\[2mm]
&\implies &&d(x_j, y_j) &&< \ \ \epsilon_j \tag{since $\epsilon_j < 1$} \\[2mm]
&\implies &&y_j &&\in \ \ (x_j - \epsilon_j, x_j + \epsilon_j) \subseteq U_j
\end{align}
$$ Furthermore, for each $k \notin J$, $U_k = {\Bbb R}$, so it is trivially the case that $y_k \in U_k$. This proves that $g \in U$, so $B_{D}(f, \epsilon) \subseteq U$.
$\square$

Before the final proof to complete the table above, we need a lemma.

Lemma: in a metric space $(X,d)$, every closed set can be written as the intersection of countably many open sets.

Proof: For a set $S \subseteq X$ and a point $x \in X-S$ define the distance between the point and the set to be $d(x,S) = \inf_{s \in S} \left\{ d(x,s) \right\}$, where $\inf$ is the infimum or greatest lower bound.



Assume $S$ is a closed set. Then we have $$
S = \bigcap_{k=1}^{\infty}{\left\{ x \in X \ | \ d(x,S)< \frac{1}{k} \right\}}
$$ a countable intersection of open sets in the metric topology (right-click the image below to expand in a new tab).


$\square$

Proof that ${\Bbb R}^{\Bbb R}$ with the product topology is not metrizable: Assume the product topology on ${\Bbb R}^{\Bbb R}$ is metrizable by a metric $d$. A set containing a single point is always a closed set, so the set containing only the zero function, $S = \{ {\bf 0} \}$, is closed. By the lemma, $S \buildrel{( \spadesuit )} \over{=} \bigcap_{k=1}^{\infty}{G_k}$ for some open sets $G_k$. Since it is an intersection, $S \subseteq G_k$ for each $k$, which means ${\bf 0} \in G_k$ for each $k$.

For all $k$, since $G_k$ is open in the product topology, there exists an $\epsilon_k > 0$ and a finite set of input values $X_k$ such that $$
\left\{ f: {\Bbb R} \rightarrow {\Bbb R} \ | \ |f(x)-0|<\epsilon_k \ \ \forall \ x \in X_k \right\} \subseteq G_k \tag{$\ddagger$}
$$ Let $X=\bigcup_{k=1}^{\infty}{X_k}$. This is a countable union of finite sets and thus a countable set, so it is not all of ${\Bbb R}$. The function $$
g(x) =
\cases{
0 & \text{if } x \in X \\
1 & \text{if } x \notin X
}
$$ is a member of $\bigcap_{k=1}^{\infty}{G_k}$ since it is included in the sets mentioned in $( \ddagger )$, but $g \neq {\bf 0}$ since $X \subsetneq {\Bbb R}$. This contradicts $( \spadesuit )$, so we conclude that no such metric $d$ can exist.
$\square$


Comparing the topologies


Almost no sequences converge in the box topology as discussed above, but any sequence that does also converges uniformly. This is because constraining all function values at once, at possibly different speeds of convergence, is more restrictive that constraining all function values at once, at the same speed of convergence.

More obvious is the fact that any sequence that converges uniformly also converges pointwise, because if we can constrain all function values within a certain distance of the limit function, then, in particular, we can constrain any single function value within that distance.

So convergence in the box topology implies uniform convergence, which implies pointwise convergence. Equivalently, all open sets in the topology of pointwise convergence are also open in the uniform topology, and all open sets in the uniform topology are also open in the box topology. This is because convergence of a sequence in a topological space is defined as eventual inclusion of the sequence points in any given neighborhood of the limit: more open sets means more neighborhoods, making it harder to find convergent sequences.

The following proof formalizes the above intuition.

Proof that $\mathscr{T}_{\rm product} \subset \mathscr{T}_{\rm uniform} \subset \mathscr{T}_{\rm box}$: In this proof, we'll assume we're dealing with ${\Bbb R}^{\Bbb N}$ to keep the notation simpler, but the same logic works for any function space $Y^X$ (where $Y$ is a metric space so that the uniform topology exists).

$\mathscr{T}_{\rm product} \subset \mathscr{T}_{\rm uniform}$:
Let $U$ be an open set in ${\Bbb R}$, and define $S(x,U) = \left\{ f \in {\Bbb R}^{\Bbb N} \  | \ f(x) \in U \right\}$, i.e. a set constraining only a single function value to lie within $U$. Note that open sets in the product topology are all finite intersections of these types of sets (we say they form a subbase for the product topology). Suppose $f \in S(x,U)$. Then $f(x) \in U$, an open set, which means there exists an $\epsilon > 0$ such that $B_{\epsilon}(f(x)) \subset U$. By the same logic, any function with uniform distance less than $\epsilon$ to $f$, i.e. any element of $B_{\rho}(f, \epsilon)$, is also in $S(x,U)$, which means $S(x,U)$ contains an open neighborhood (in the uniform topology) of each of its points. So $S(x,U)$ is an open set in $\mathscr{T}_{\rm uniform}$.

A general open set in $\mathscr{T}_{\rm product}$ is just a finite intersection of $S(x,U)$'s, so take the smallest $\epsilon$ of these from the process above, which is guaranteed to exist since there is always a minimum of a finite set of numbers. So all open sets in $\mathscr{T}_{\rm product}$ are open in $\mathscr{T}_{\rm uniform}$, which means $\mathscr{T}_{\rm product} \subset \mathscr{T}_{\rm uniform}$.

$\mathscr{T}_{\rm uniform} \subset \mathscr{T}_{\rm box}$:
Since the open balls $B_{\rho}(f, \epsilon)$ are a base for $\mathscr{T}_{\rm uniform}$, i.e. open sets are unions of such open balls, we just need to prove that these open balls are open in the box topology. To that end, let $f = (x_1, x_2, x_3, \dotsc)$, $\epsilon > 0$, and let $g = (y_1, y_2, y_3, \dotsc) \in B_{\rho}(f, \epsilon)$.

Since $B_{\rho}(f, \epsilon)$ is open, there exists an $\epsilon_2 > 0$ such that $B_{\rho}(g, \epsilon_2) \subset B_{\rho}(f, \epsilon)$. Furthermore, $B_{\rho}(g, \epsilon_2)$ contains an open neighborhood of $g$ in the box topology, namely, the open box $$
(y_1 - \frac{\epsilon_2}{2}, y_1 + \frac{\epsilon_2}{2}) \times
(y_2 - \frac{\epsilon_2}{2}, y_2 + \frac{\epsilon_2}{2}) \times
(y_3 - \frac{\epsilon_2}{2}, y_3 + \frac{\epsilon_2}{2}) \times
\dotsb
$$ This proves that $B_{\rho}(f, \epsilon)$ is open in the box topology.
$\square$

The fact that more functions converge pointwise than uniformly brings us to one last interesting observation: a sequence of continuous functions can converge pointwise to a function with a discontinuity.

Example: The sequence of functions $f_n: {\Bbb R} \rightarrow {\Bbb R}$ defined by $$
f_n(x) =
\cases{
0 &\text{if } x \leq 0 \\
nx & \text{if } 0 < x \leq 1/n \\
1 &\text{if } x > 1/n
}
$$ consists entirely of continuous functions, as the below diagram of $f_2$ through $f_{10}$ illustrates:



However, as the slanted part gets narrower as $n \rightarrow \infty$, the sequence converges pointwise to the discontinuous function $$
f(x) =
\cases{
0 &\text{if } x \leq 0 \\
1 & \text{if } x > 0
}
$$ $\square$

However, if such a sequence converges uniformly, it is always to another continuous function. This is because the set of continuous functions is closed in the uniform topology (refer to the proof that closed sets contain their limit points, in the second preliminary post). I'll conclude this post with a definition and a proof.

Topological definition of continuity: Let $(X, \mathscr{T}_X)$ and $(Y, \mathscr{T}_Y)$ be topological spaces and $f$ be a function $X \rightarrow Y$. $f$ is called continuous if for all $x_0 \in X$, for every neighborhood $V \in \mathscr{T}_Y$ of $f(x_0)$, there exists a neighborhood $U \in \mathscr{T}_X$ of $x_0$ such that $f(x) \in V$ whenever $x \in U$.
$\square$

This definition is easily shown to be equivalent to the more familiar continuity criterion for ${\Bbb R}^{\Bbb R}$: $f$ is continuous if, for any $x_0 \in {\Bbb R}$ and any $\epsilon>0$, there exists a $\delta$ such that $|f(x)-f(x_0)|<\epsilon$ whenever $|x-x_0|<\delta$.

Uniform Limit Theorem: Let $X$ be a topological space and $(Y,d)$ be a metric space. Let $C$ be the set of continuous functions $X \rightarrow Y$. Then $C$ is a closed set in the uniform topology on $Y^X$.

Proof: Let $f$ be the uniform limit of a sequence of continuous functions. Then $f$ is a limit point of $C$, i.e. for any $\epsilon>0$, there exists a continuous function $g: X \rightarrow Y$ such that $\rho(f,g)<\epsilon / 3$, and thus $d(f(x),g(x)) < \epsilon / 3 \ ( \clubsuit )$ for any particular value of $x \in X$. Furthermore, since $g$ is continuous, for any $x_0 \in X$, there exists a neighborhood $U \subset X$ of $x_0$ such that $d(g(x),g(x_0)) < \epsilon / 3 \ ( \diamondsuit )$ for all $x \in U$.

Thus, for $x \in U$, we have $$
\begin{align}
d(f(x),g(x)) &< \epsilon / 3 \tag{by $\clubsuit$} \\[2mm]
d(f(x_0),g(x_0)) &< \epsilon / 3 \tag{by $\clubsuit$} \\[2mm]
d(g(x),g(x_0)) &< \epsilon / 3 \tag{by $\diamondsuit$}
\end{align}
$$ Since $d$ is a metric, we can use the triangle inequality to conclude that $$
\begin{align}
d(f(x),f(x_0)) &\leq d(f(x),g(x)) + d(g(x),g(x_0)) + d(g(x_0),f(x_0)) \\[2mm]
&< \frac{\epsilon}{3} + \frac{\epsilon}{3} + \frac{\epsilon}{3} \\[2mm]
&= \epsilon
\end{align}
$$ So we have found a neighborhood, namely $U$, of $x_0$, such that $d(f(x),f(x_0)) < \epsilon$ whenever $x \in U$. This means that $f$ is continuous and thus $f \in C$. A set which contains all its limit points is a closed set, so $C$ is closed.
$\square$

Technically, I haven't proved that a set containing all its limit points is closed; to that end, note that if $S$ is not closed in a metric space $(X,d)$, then $X-S$ is not open. So there is a point $x \in X-S$ such that for every $n=1,2,3, \dotsc$, the open ball $B_{d}(x,\tfrac{1}{n})$ contains a point $x_n \in X-(X-S)=S$. Thus $(x_n)$ is a sequence of points in $S$ converging to a point $x$ not in $S$, so $S$ does not contain all its limit points. Thus, if a set does contain all its limit points, it must be closed.

Thank you for reading- please post any questions or feedback in the comments section.

References:
Jim Belk - Bard College
David Preiss - Warwick University
Dustin Hedmark - University of Chicago

How close is "close enough"? Metric Spaces, Topological Spaces, and Convergence

Preliminaries: set basics, Euclidean space


Open and closed sets; topology


You may be familiar with the concept of open and closed sets in Euclidean space ${\Bbb R}^n$. For those who aren’t, an open set is one that does not contain any of its boundary points. A closed set is one that contains all of its boundary points. The following graphic provides a good illustration (dotted lines indicate boundary points not included in the set, while solid lines indicate boundary points which are included):

Diagram 1: an intuitive depiction of open and closed sets

The complement of an open set is closed, and the complement of a closed set is open. The top-left item in the diagram also shows that some sets are neither open nor closed. In addition, the entire space ${\Bbb R}^n$ is both open and closed, as is the empty set.

Furthermore, taking unions and intersections of open and closed sets yield the following results, best summarized in a diagram:

Diagram 2: unions and intersections of open and closed sets

The following definition formalizes the notion of an open set in ${\Bbb R}^n$ and allows us to justify the conclusions above.

Definition of open/closed sets in ${\Bbb R}^n$: An open set is a set $U$ such that for each $x \in U$, there exists an $\epsilon>0$ (which may be very small) such that $y \in U$ whenever the distance between $x$ and $y$ is less than $\epsilon$. A closed set is one whose complement is an open set.

Diagram 3: definition of an open set $U$ in the plane ${\Bbb R}^2$

The set of all points $y$ located strictly within distance $\epsilon$ of a point $x$ is called the open ball of radius $\epsilon$ centered at $x$ and is denoted $B_{\epsilon}(x)$ as shown in the picture. Using this definition of open sets, it is easy to prove the properties in Diagram 2.

Note that the definition presented above actually relied on our having a notion of distance between two points. In a general space, this may not be the case, or there may be numerous measures of distance from which we can choose. The field of topology studies properties of more general spaces without necessarily relying on a distance function. To start, we define the open sets as a collection having some of the familiar properties presented above.

Definition of a topology: Let $X$ be a set, and let $\mathscr T$ be a subset of the power set of $X$, i.e. a collection of subsets of $X$. $\mathscr T$ is called a topology on $X$, and the elements of $\mathscr T$ are called open sets, if the following properties hold:
  1. The empty set and $X$ itself are in $\mathscr T$.
  2. Any union of (finitely or infinitely many) elements of $\mathscr T$ is also an element of $\mathscr T$.
  3. Any intersection of finitely many elements of $\mathscr T$ is also an element of $\mathscr T$.
Closed sets are defined as sets whose complements are open sets.

The "dumbest" example of a topology on a set $X$ is the set $\{ X , \emptyset \}$, called the discreet topology. Another example for $X = {\Bbb R}^n$ was the above-mentioned usual definition of open sets using the distance function. Namely, the open sets in ${\Bbb R}^n$ (in the usual topology) are the ones which contain an open ball of some radius around each of their points, or equivalently, contain none of their boundary points.

The open sets defined via open balls as described above comprise the so-called metric topology, in which every open set can be written as the union of open balls, and every point is contained in an open ball (actually, infinitely many open balls). Because of these two facts, we say that the open balls form a base for, or generate, the metric topology.


Metric Spaces


In defining the above-described metric topology on ${\Bbb R}^n$, we made use of the fact that we can measure distance between two points using the well-known distance formula. Similarly to the way we used the properties from ${\Bbb R}^n$ as the definitions of topological spaces and vector spaces, we can do the same to define the properties a distance measure should have.

A distance function on a set $X$ should map any two elements of $X$ to a "distance." Formally, a function $d: X \times X \rightarrow [0, \infty)$ (i.e. a function taking any two inputs from $X$ and returning a non-negative real number) is called a metric if it satisfies the following 3 properties:

  1. Positive-definiteness: $d(x,y) \geq 0$, with $d(x,y)=0 \iff x=y$,
  2. Symmetry: $d(x,y)=d(y,x)$, and
  3. Triangle inequality: $d(x,z) \leq d(x,y) + d(y,z)$.

The triangle inequality: the length of one side of a triangle
cannot exceed the sum of the lengths of the other two.


For example, on the real number line, the absolute value of the difference between two numbers (i.e. the distance formula in one dimension) is a metric. The symmetry and positive-definiteness of $d(x,y) = |y-x|$ are obvious; to prove the triangle inequality $|z-x| \leq |z-y| + |y-x|$ for all real numbers $x,y,z$, note that we can set $a = z-y$ and $b=y-x$ so that $z-x = a+b$. Thus it suffices to prove that $|a+b| \leq |a| + |b|$ for all $a,b \in {\Bbb R}$.

Proof of the triangle inequality for absolute value: Note that $$
\begin{align}
-|a| &\leq a \leq |a| \tag{1} \\
-|b| &\leq b \leq |b| \tag{2}
\end{align}
$$ Adding equations (1) and (2) gives $$
-(|a|+|b|) \leq a+b \leq |a|+|b| \tag{$\star$}
$$ To simplify the notation, call $a+b := \phi$ and call $|a|+|b| := \psi$. Then equation $( \star )$ says that $-\psi \leq \phi \leq \psi$. In other words, $| \phi | \leq \psi$. Translating back to $a$ and $b$, we have shown that $|a+b| \leq |a| + |b|$.
$\square$

So the usual distance measure on the real number line is indeed a metric, making ${\Bbb R}$ a metric space. The usual topology on ${\Bbb R}$ is the metric topology generated by the open intervals $(a,b) = \{ x \ | \ a<x<b \}$. These are also the open balls defined by the absolute value metric, since $(a,b) = B_{(b-a)/2}(\frac{a+b}{2})$ and $B_{\epsilon}(x) = (x-\epsilon, x+\epsilon)$.

Given a topological space $(X, {\mathscr T})$, if the elements of $\mathscr T$, i.e. the open sets, are generated by the open balls $B_{\epsilon}(x)$ in some metric, then $(X, {\mathscr T})$ is called metrizable. There are examples of non-metrizable topological spaces which arise in practice, but in the interest of a reasonable post length, I will defer presenting any such examples until the next post. However, it is worth noting that non-metrizable spaces are the ones which necessitate the study of topology independent of any metric.


Convergence of sequences in a metric space


Suppose we have a sequence of points $x_1, x_2, x_3, \dotsc$ in a metric space $(X,d)$. To simplify the notation, such a sequence is often written as $(x_n)$. How do we know if the sequence has a limit? Take a look at the following diagrams:



The first sequence of points obviously does not have a limit; it never "closes in" on any point. The second sequence, on the other hand, does seem to have a limit. The formal definition makes this intuition concrete.

Definition of limit of a sequence (metric space): Let $(x_n)$ be a sequence in the metric space $(X,d)$. The point $L \in X$ is called the limit of $(x_n)$ if, for any $\epsilon >0$, there exists a number $N$ such that $d(x_n, L) < \epsilon$ whenever $n>N$. In this case, we write $L = \displaystyle{\lim_{n \rightarrow \infty}{x_n}}$.
$\square$

This means that given any threshold $\epsilon$, which can be arbitrarily small, the sequence points are eventually (i.e. for $n>N$, where $N$ may depend on $\epsilon$ and may be very large) all within distance $\epsilon$ of $L$.

Illustration of a convergent sequence: here, the sequence terms are all within
$\epsilon$ of $L$ after the 13th term. So for the choice of $\epsilon$ shown, $N=13$ works.


For example, the sequence of real numbers $1, \frac{1}{2}, \frac{1}{4}, \frac{1}{8}, \dotsc, \frac{1}{2^n}, \dotsc$ has limit $0$: take a very small tolerance such as $\epsilon = \frac{1}{1 \mathord{,} 000 \mathord{,} 000}$. Then we have 
$$\left| \frac{1}{2^n} - 0 \right| < \frac{1}{1 \mathord{,} 000 \mathord{,} 000} \iff 2^n > 1 \mathord{,} 000 \mathord{,} 000 \iff n > \log_{2}{1 \mathord{,} 000 \mathord{,} 000}$$Similarly, for a general $\epsilon$, we have $|x_n-0| < \epsilon$ whenever $n$ is larger than $N = \log_{2}{(1 / \epsilon)}$. Since we have found a suitable $N$ for any $\epsilon$, $0$ is indeed the limit of the sequence.



Convergence of sequences in a topological space


A topology $\mathscr T$ on a set $X$ specifies open and closed sets independently of any metric which may or may not exist on $X$. The definition of convergence above relied on a metric, but we can remove this reliance by defining convergence purely in terms of open sets.

Definition of limit of a sequence (topological space): Let $(x_n)$ be a sequence in the topological space $(X, {\mathscr T})$. The point $L \in X$ is called the limit of $(x_n)$ if, for any open set $U$ containing $L$, there exists a number $N$ such that $x_n \in U$ whenever $n>N$. In this case, we write $L = \displaystyle{\lim_{n \rightarrow \infty}{x_n}}$.
$\square$

In the metric space case above, these open sets $U$ were simply the open balls $B_{\epsilon}(L)$, so the two definitions agree in a metrizable space. Without a metric by which to define open balls in a general topological space, we replace these with open sets $U$ containing $L$, known as neighborhoods of $L$.

To conclude this post, let's take a look at an easy proof that demonstrates how things are done in topology and also ties in the notion of convergence. 

Recall that the closed sets in ${\Bbb R}^n$ are the ones containing all of their boundary points, or equivalently, the limits of all their convergent sequences (can you picture why every boundary point, as well as every interior point, of a set is the limit of a convergent sequence contained in that set?). An analogous statement holds in general topological spaces.

Closed sets contain their limit points: let $(X, {\mathscr T})$ be a topological space, and let $C \subseteq X$ be a closed set. Let $(c_n)$ be a convergent sequence of points all contained in $C$, with $\displaystyle{\lim_{n \rightarrow \infty}{c_n}}=L$. Then $L \in C$.

Proof: Since $C$ is closed, its complement $X-C$ is open (by the definition of closed sets). Assume $L \notin C$. Then $L \in X - C$, an open set. Since $X-C$ is an open set containing $L$, the definition of convergence tells us that there exists an $N$ such that $c_n \in X-C$ for all $n > N$. But this is a contradiction, since $c_n \in C$ for all $n$.
$\square$

In the next post, we'll explore a few topologies on sets of functions, and we'll see that the most intuitive one is not metrizable.

I hope you enjoyed this post, and please feel free to post any questions in the comments section below.