Powered by Blogger.

Basis and Dimension

Preliminary: Euclidean Space and Vectors

A sheet of paper is two-dimensional because, given any reference point on the sheet, which we can call $(0,0)$, any other point can be specified by how far to the right and how far up it is from the reference. This is just the notion of an arrow with a magnitude and direction from the first post on vectors.

Similarly, space as we know it is 3-dimensional because, given any reference point, which we can call $(0,0,0)$, any other point can be specified by its distance in length, width, and depth from the reference.

In this post, we'll formalize the dimension of a vector space by introducing the concept of a basis. I will introduce the basic terminology as well as full proofs of a few important results; while this won't be an exhaustive linear algebra primer by any means, any further questions are welcome in the comments section.

Basis of a vector space

Sticking with the 3-dimensional space example, let's call the $x$-axis the left-right direction, so that $(1,0,0)$ corresponds to a step one unit to the right. A step forward would be along the $y$-axis, or $(0,1,0)$, and a step upwards would be along the $z$-axis at $(0,0,1)$. As discussed in the preliminary post, an arrow of any direction and magnitude can be written as a linear combination of these 3 vectors. In symbols, if we call these three ${\bf e}_1$, ${\bf e}_2$, and ${\bf e}_3$, then any vector ${\bf v}$ can be written as $$
{\bf v} = a_1 {\bf e}_1 + a_2 {\bf e}_2 + a_3 {\bf e}_3
$$ for some scalars $a_1, a_2, a_3$. We write this vector as ${\bf v} = (a_1, a_2, a_3)$, implicitly expressing it in terms of the ${\bf e}_i$'s. Since any vector can be written this way, we say that the ${\bf e}_i$'s span the space ${\Bbb R}^3$.

If a vector ${\bf v}$ can be written as a linear combination of another set of vectors as above (without all $a_i$'s equal to $0$), then it is called linearly dependent on those vectors. If not, then the vectors are linearly independent. Clearly, we can't combine the ${\bf e}_i$ vectors above in any way to get the other ones, so they are linearly independent.

A set of vectors which (1) is linearly independent and (2) spans the entire vector space is called a basis, and the ${\bf e}_i$'s above are called the standard basis of ${\Bbb R}^3$. Essentially, a basis is a "minimal" (see the next section) set which still spans all of $V$ using the vector space operations of vector addition and scalar multiplication: the linear independence requirement ensures that a basis excludes any extra "trash" that doesn't add to the span. For example, since scalar multiplication is built into any vector space, if we have ${\bf e}_1 = (1,0,0)$ in our basis, we clearly don't also need $(2,0,0) = 2 {\bf e}_1$; any vector expressed as a linear combination including $(2,0,0)$ could include ${\bf e}_1$ instead with the coefficient doubled, so $(2,0,0)$ really isn't adding to the span.

On the other hand, we could remove $(1,0,0)$ from the standard basis and replace it with $(2,0,0)$ to obtain a different basis, so a basis certainly isn't unique. In fact, we will prove below that any linearly independent set of vectors can be extended to a basis. However, the constants $a_i$ which express a given vector in terms of a particular basis are unique: if different constants $b_i$ existed, then we could have a vector ${\bf v}$ such that $$
{\bf v} &= a_1 {\bf e}_1 + a_2 {\bf e}_2 + a_3 {\bf e}_3 \\[1mm]
&= b_1 {\bf e}_1 + b_2 {\bf e}_2 + b_3 {\bf e}_3 \\[2mm]
\implies {\bf v} - {\bf v} = {\bf 0} &= (a_1-b_1) {\bf e}_1 + (a_2-b_2) {\bf e}_2 + (a_3-b_3) {\bf e}_3
$$ If $a_i \neq b_i$ for at least one value of $i$, then this would contradict the fact that the basis vectors are linearly independent (since a linear combination of them without all coefficients equal to 0 yields the zero vector). Thus, we must have $a_i=b_i$ for all values of $i$, i.e. the representation of ${\bf v}$ in terms of the particular basis is unique. The same argument applies to vector spaces other than just ${\Bbb R}^3$.

Dimension of a vector space

The number of vectors forming a basis (which is infinite for some spaces) is called the dimension of the space. ${\Bbb R}^3$ has dimension $3$ because it has $\{ {\bf e}_1, {\bf e}_2, {\bf e}_3 \}$ as a basis, while the $\ell^p$ spaces from the last post have infinite dimension. Their standard bases consist of sequences with a $1$ for the $i$-th component and zeros elsewhere.

Defining dimension in this way implies that every basis has the same number of elements. This is indeed the case, and the proof for finite-dimensional spaces is a consequence of the following related lemma.

Lemma: If $B = \{ {\bf b}_1, {\bf b}_2, \dotsc, {\bf b}_n \}$ is a basis for a vector space $V$, and $W = \{ {\bf w}_1, {\bf w}_2, \dotsc, {\bf w}_m \}$ is a linearly independent set of vectors, then $m \leq n$. If $m=n$, then $W$ spans $V$ and is thus another basis.

Proof: Since $B$ is a basis for $V$, we can express ${\bf w}_1$ as a linear combination of the vectors in $B$, $$
{\bf w}_1 = c_1 {\bf b}_1 + c_2 {\bf b}_2 + \dotsb + c_n {\bf b}_n
$$ where at least one of the $c_i$'s is nonzero. Let $j$ be an index such that $c_j \neq 0$. Then we have $$
{\bf b}_j = \tfrac{1}{c_j}{\bf w}_1 - \tfrac{c_1}{c_j}{\bf b}_1 - \dotsb - \tfrac{c_{j-1}}{c_j}{\bf b}_{j-1} - \tfrac{c_{j+1}}{c_j}{\bf b}_{j+1} - \dotsc - \tfrac{c_n}{c_j}{\bf b}_n
$$ Since $B$ is a basis, any vector ${\bf v} \in V$ can be written as a linear combination of the ${\bf b}_i$'s: $$
{\bf v} &= v_1 {\bf b}_1 + v_2 {\bf b}_2 + \dotsb + v_n {\bf b}_n \\[2mm]
&= v_1 {\bf b}_1 + \dotsb + v_{j-1} {\bf b}_{j-1} \\[1mm]
& \ \ \ + v_j \underbrace{\left[ \tfrac{1}{c_j}{\bf w}_1 - \tfrac{c_1}{c_j}{\bf b}_1 - \dotsb - \tfrac{c_{j-1}}{c_j}{\bf b}_{j-1} - \tfrac{c_{j+1}}{c_j}{\bf b}_{j+1} - \dotsc - \tfrac{c_n}{c_j}{\bf b}_n \right]}_{{\bf b}_j} \\[1mm]
& \ \ \  +  v_{j+1} {\bf b}_{j+1} + \dotsb + v_n {\bf b}_n
$$ This eliminates ${\bf b}_j$ from the expansion, showing that the arbitrary vector ${\bf v}$ is also a linear combination of the set $B_1 = \{ {\bf b}_1, {\bf b}_2, \dotsc , {\bf b}_{j-1}, {\bf w}_1, {\bf b}_{j+1}, \dotsc , {\bf b}_n \}$. Thus, $B_1$ spans $V$.

Furthermore, $B_1$ is linearly independent: since ${\bf w}_1$ can be uniquely expressed as a linear combination of the vectors in the original basis $B$, a linear combination of the vectors in $B_1$ can be uniquely rewritten as a linear combination of vectors in $B$ by replacing ${\bf w}_1$ by its expansion $c_1 {\bf b}_1 + c_2 {\bf b}_2 + \dotsb + c_n {\bf b}_n$. Since that unique expansion includes a nonzero factor of ${\bf b}_j$, the resulting linear combination of elements of $B$ has at least one nonzero coefficient. Therefore, if such a combination yielded ${\bf 0}$, that would contradict the fact that $B$ (assumed to be a basis) is linearly independent. So $B_1$ must be linearly independent and thus is a basis for $V$.

Now, since $B_1$ is a basis, we can uniquely express ${\bf w}_2$ as a linear combination $$
{\bf w}_2 = d_1 {\bf b}_1 + d_2 {\bf b}_2 + \dotsb + d_{j-1} {\bf b}_{j-1} + d_j {\bf w_1} + d_{j+1} {\bf b}_{j+1} + \dotsb + d_n {\bf b}_n
$$ Since $W$ is linearly independent, it is not the case that ${\bf w}_2 = d_j {\bf w}_1$ with all other $d_i$'s equal to zero, so there exists a $d_k \neq 0$ with $k \neq j$. Then we can repeat the whole process to kick out ${\bf b}_k$ and replace it with ${\bf w}_2$, yielding a new basis $B_2$.

If $m=n$, then we can continue in this fashion until we kick out the last of the original vectors from $B$ at the $n$-th iteration, obtaining a new basis $B_n$ consisting of the vectors ${\bf w}_1, {\bf w}_2, \dotsc, {\bf w}_n$. This proves that any linearly independent set of $n$ vectors is a basis.

Furthermore, if $m>n$, we can continue the process and uniquely express ${\bf w}_{n+1}$ as a linear combination of the basis $B_n = \{ {\bf w}_1, {\bf w}_2, \dotsc, {\bf w}_n \}$. This is a contradiction, since $W$ is assumed to be linearly independent. Thus, it must be the case that $m \leq n$.

Corollary: If $A = \{ {\bf a}_1, \dotsc {\bf a}_m \}$ and $B = \{ {\bf b}_1, \dotsc {\bf b}_n \}$ are two bases for a finite-dimensional vector space $V$, then $m=n$.

Proof: By the above lemma, $m \leq n$ since $A$ is linearly independent, and $B$ is a basis. Also, $n \leq m$ since $B$ is linearly independent, and $A$ is a basis. Therefore, $m=n$.

Note that this also allows us to conclude that in a vector space $V$ with dimension $n$, a set with fewer than $n$ vectors cannot span $V$. Thus, a basis is a minimal set that still spans all of $V$. On the other hand, by the lemma, a basis is a maximal set of vectors that is still linearly independent.

Extending to a basis

If we have a vector space $V$ with finite dimension $n$ and a set $S$ of $p$ linearly independent vectors with $p<n$, then we can extend $S$ to a basis for $V$.

Indeed, since $S$ has only $p$ vectors, it cannot span $V$. Thus, there exists a vector ${\bf v}_1$ such that ${\bf v}_1$ is not in the span of $S$ (the set of all linear combinations of the vectors in $S$). So $S \cup \{ {\bf v}_1 \}$ is a linearly independent set of $p+1$ vectors.

Continuing in this manner until we have exactly $n$ vectors (which will take $n-p$ steps), we will obtain a set $S'$ of $n$ linearly independent vectors. The lemma above implies that $S'$ will span $V$ and thus be a new basis.

In practice, finding the vectors ${\bf v}_1, {\bf v}_2, \dotsc, {\bf v}_{n-p}$ as described above consists of either guess-and-check or solving sets of linear equations. For example, if we have the vectors $(0,3,0)$ and $(1,0,-2)$ and would like to extend to a basis for ${\Bbb R}^3$, then we need to find a vector ${\bf x} = (x_1, x_2, x_3)$ such that ${\bf x}$ is linearly independent from the first two vectors. If ${\bf x}$ were a linear combination, then we would have:$$
(x_1, x_2, x_3) = a_1 (0,3,0) + a_2 (1,0,-2)
$$ This gives us 3 equations: $$
x_1 &= a_2 \\[2mm]
x_2 &= 3 a_1 \\[2mm]
x_3 &= -2 a_2
$$ We just need to find one vector which does not satisfy these equations, regardless of the choice of the $a_i$'s. Since the first and third equations imply that $-\tfrac{1}{2}x_3 = x_1$, any vector not satisfying that equation will work. In this case ${\bf x} = (0,0,1)$ would work. Note that the solution is not unique; there are infinitely many solutions, such as ${\bf x} = (0,0,x_3)$ for any value of $x_3$ as well as many other solutions.

While the system of equations above was rather easy, we can use the more formal method of Gaussian elimination to simplify tougher systems.

Extending to a basis- infinite-dimensional case

In an infinite-dimensional vector space, we can also extend a linearly independent set to a basis, but the proof of this fact is not as simple as setting up a system of linear equations to solve (there would be infinitely many equations). In fact, the proof relies on the axiom of choice and thus does not show us how to actually perform the basis extension. For the sake of completeness, I'll provide a sketch of the proof anyway, with the warning that it won't make much sense to those not somewhat well versed in set theory.

Sketch of proof: Let $V$ be a vector space, not necessarily finite-dimensional, and let $S$ be a linearly independent subset of $V$. Let $A$ be the collection of sets whose members are the linearly independent subsets of $V$ which contain $S$. In symbols, $$
A = \{ T \subseteq V : \ S \subseteq T \text{ and $T$ is linearly independent} \}
$$ An arbitrary chain in $A$ takes the form $T_1 \subseteq T_2 \subseteq T_3 \subseteq \dotsb$ where $T_i \in A$ for $i=1,2,3,\dotsc$. The set $T_{\cup} = \bigcup_{i=1}^{\infty}{T_i}$ is an upper bound for the chain in that it is bigger (as defined by the $\subseteq$ relation) than each element of the chain.

Since all chains have an upper bound, Zorn's Lemma (equivalent to the axiom of choice, though I will omit the proof of this fact) implies that $A$ has a maximal element, i.e. an element $M$ such that $T \subseteq M$ for all $T \in A$. Suppose $M$ does not span $V$. Then there exists a vector ${\bf z} \in V$ which is not a linear combination of the vectors in $M$. But then $M' = M \cup \{ {\bf z} \}$ is a linearly independent set containing $M$ and thus $S$. So $M' \in A$, a contradiction since $M$ was supposed to be a maximal element. Thus it must be the case that $M$ does span $V$, so $M$ is a basis.

As with most AC-based proofs, this one isn't very satisfying, but there you have it: even in the infinite-dimensional case, you can extend a linearly independent set to a basis (but I can't tell you how!).

That will do it for this post- thanks for reading and please post any questions in the comments section.

The Axiom of Choice

Preliminary: set basics

The axiom of choice is a somewhat controversial axiom of set theory which is frequently utilized but not well known by most non-math folks. In this brief post, I'll demystify the axiom of choice and explain some of the issues that arise from its use.

Statement of the Axiom

The axiom of choice (or "AC" for short) states that, given a collection of non-empty sets, there exists a way to choose one element from each of them.

In order to state the axiom more precisely, we first need to define a choice function. Given a collection $X$ of non-empty sets, a choice function is a function which assigns to each set in the collection one of its own elements. In symbols, it is a function $$
f: X \rightarrow \bigcup_{A \in X}{A}
$$ with the property that for all $A \in X$, $f(A) \in A$.

Take a second to read through that again as it can be a bit confusing: $X$ is a set whose elements are sets. If $A$ is a set in this collection/set of sets (i.e. an element of $X$), then $A$ is assumed to be non-empty and thus has at least one element. A choice function takes sets like $A$ as inputs, and assigns to them an output from the set $\bigcup_{A \in X}{A}$ of all elements of sets (like $A$) in the collection. This output must be an element of $A$ itself (the input "value") every time in order for $f$ to be a choice function. Put differently, $f$ cannot assign to any set $A$ an output value $b$, where $b \in B$ for some other set $B \in X$ with $B \neq A$.

With that out of the way, the axiom of choice states that for any collection $X$ of non-empty sets, a choice function exists. This is the rigorous expression of the plain-English version above: given a collection of non-empty sets, there exists a way to choose one element from each of them.

"Sounds pretty obvious. What's the big controversy?"

As a practical example, suppose we have a bunch of buckets with items inside. The AC just tells us that we can pick one item from each. For a finite number of buckets, this really is obvious and actually can be proven by induction from other basic axioms of set theory.

However, suppose we have a countably infinite number of buckets, i.e. we have numbered buckets labeled $1, 2, 3, 4, \dotsc$ going on forever. You can go to the first bucket and reach in to choose an item, then the second, then the third, etc. For any finite number of buckets, this sequence of events will obviously terminate. But since the buckets go on forever, at no point will you ever have chosen an item from every bucket; a choice function needs to specify an item for every bucket. It's even worse if you have uncountably many buckets, e.g. if you have one bucket for every real number instead of just the numbers $1,2,3, \dotsc$.

Now, if you have a formulaic way to specify which item should be chosen from each bucket, then there's actually still no issue in defining a choice function. For example, suppose each of the infinitely many buckets contains several slips of paper, each with a single number $1,2,3, \dotsc$ on it and with no repeats within any buckets. Then each bucket would contain a slip with the smallest label. Thus, there exists a choice function for these buckets: choose the smallest label from each bucket. This is a precise specification for a selection of one item from every bucket.

The issue arises when there is no formulaic way to make a choice for all the buckets. For example, suppose $X$ is the set of all non-empty subsets of the real number line. This $X$ is uncountably infinite and contains some subsets such as $\{3, 123412, \pi, \text{all numbers greater than a million} \}$ for which we can still say, "choose the smallest element" (in this case, $3$). But $X$  also contains sets such as $(0,1) = \{x \in {\Bbb R} \ | \ 0 < x < 1 \}$ which have no smallest element. There is no formulaic way to specify how one would choose one element from every non-empty subset of the real numbers.

"Fine, I admit it can be a little dicey: clearly, you can't always write some kind of computer algorithm every time to create a choice function in practice. But does assuming the AC really cause any harm?"

Sort of, but it depends on your point of view.

One way to prove something exists is to specify an algorithm to construct it; this is called a constructive proof. On the other hand, when a proof of something's existence relies on the axiom of choice, it doesn't tell you how to actually construct the "something".

This may not be a problem if we don't care about constructing the thing in practice. But the AC also allows one to prove some bizarre results such as the Banach-Tarski paradox, which states that it is possible to reassemble pieces of a ball into two copies of the same ball just by moving and rotating (without stretching/shrinking). It contradicts basic geometric intuition that this could be the case, since the volumes of the two new balls would be the same as that of the original one. The role of the axiom of choice here is that it allows for the construction of sets of points for which "volume" is not defined (non-measurable sets). Of course, it doesn't actually tell us how to do this, only that it can be done.

To conclude, even though the AC seems obvious, it amounts to assuming the theoretical existence of objects which can never be built in practice. As a result, these objects (such as non-measurable sets) are basically impossible to picture or understand. These never arise in practical applications since you can't actually construct them without the AC, but they lead to some counterintuitive, seemingly paradoxical results.

In applications such as the stochastic calculus used in financial math, you will see the analysis restricted to measurable sets, as everything you can think up (without invoking the AC) will correspond to one of these anyway. I hope this post helps explain why authors need to go through all the trouble of making these restrictions, and also that it helps you recognize instances in which the AC has been invoked in your future forays into pure math.