Powered by Blogger.

The Plane in R^3

Prerequisites: Euclidean Space and Vectors

Suppose we have a plane in ${\Bbb R}^{3}$ which contains the point $(0,0,0)$ but does not intersect the axes at any other point. How many octants does the plane intersect?

Let's start by analyzing the 2-dimensional analog, then come back to the 3-dimensional problem with an algebraic approach and a geometric interpretation. Finally, we'll try to generalize the answer to higher dimensions.

Lines in the Plane ${\Bbb R}^{2}$

The 2-dimensional analog of a plane in ${\Bbb R}^{3}$ containing the origin is a line in ${\Bbb R}^{2}$ containing the origin. For those familiar with linear algebra, the former is a vector subspace of ${\Bbb R}^{3}$ of dimension 2 (i.e. co-dimension 1), and the latter is a vector subspace of ${\Bbb R}^{2}$ of dimension 1 (still co-dimension 1). If you aren't familiar with linear algebra, ignore that sentence and just keep reading.

The equation of a line in ${\Bbb R}^{2}$ is $y = mx + b$, where $m$ is the slope ("rise over run," or change in $y$ per change in $x$), and $b$ is the $y$-intercept (where the line hits the $y$-axis). A point $(x_{0},y_{0})$ is on the line if it satisfies the equation, i.e. if the equality $y_0 = mx_0 + b$ is true.

A line passing through the origin has a $y$-intercept of 0, thus $b=0$ and the equation is simply $y=mx$. Now, if $m$ is 0, then the line would just go along the $x$-axis, so if it doesn't touch the axis, we must have $m \neq 0$, so either $m > 0$ or $m < 0$. In the case where $m>0$, a positive $x$ value gives a positive $y$, and a negative $x$ gives a negative $y$, so the line would pass through quadrants 1 and 3 (top-right and bottom-left). In the case where $m<0$, the same analysis shows that the line passes through quadrants 2 and 4 (top-left and bottom-right). So the line passes through 2 quadrants.

The $y = mx + b$ formulation works fine for this question, but if we want to put $x$ and $y$ on more even footing (this will come in handy in the higher-dimensional cases so that we don't always have to solve for one of the variables), we can use the other form of the equation of a line, $ax + by = c$, where $a,b,c \in {\Bbb R}$ are constants. In order to have the origin on the line, we must have $c=0$, because the point $(0,0)$ is on the line, and thus $a(0)+b(0) = 0 = c$. so the equation is simply $ax+by=0$. If either $a$ or $b$ were zero, the line would just be one of the axes, so we must have $a,b \neq 0$. Whether they are positive or negative, you can work out by plugging in positive or negative $x$ and $y$ values that the line will either pass through quadrants 1 and 3 or 2 and 4.

For example, if $a,b>0$, then if $x>0$, we must have $y<0$ in order to have $ax+by=0$. Similarly, if $x<0$, then $y$ would have to be positive. So the line passes through quadrants 2 and 4. We get the same result in the case where $a,b<0$. If $a$ and $b$ have opposite signs, then the line will pass through quadrants 1 and 3.

Notice that the line does not pass through the quadrant containing the point $(a,b)$. The vector $(a,b)$ is actually perpendicular to the line $ax+by=0$, and we can see that from the fact that the equation can be rewritten as ${\bf n} \cdot {\bf x} = 0$ where ${\bf n} = (a,b)$ and ${\bf x} = (x,y)$ (recall that two vectors are perpendicular if and only if their dot product is zero).

The Plane in Space ${\Bbb R}^{3}$

Let's go back to the ${\Bbb R}^{3}$ case, building off of the above discussion. The general equation of a plane is $ax+by+cz=d$, where $a,b,c,d \in {\Bbb R}$ are constants. In order for the plane to contain $(0,0,0)$, we must have $d=0$, so the equation is now just $ax+by+cz=0$. If any of the constants is 0, then the plane will actually look like the equation of a line from above. For example, if $c=0$, then we'd have $ax+by=0$, which would be a line in the $xy$-plane extended vertically up and down in the positive and negative $z$ directions, and in fact containing the entire $z$-axis. Can you see why (hint: show that the points on the $z$-axis, i.e. points of the form $(0,0,z)$, all satisfy the equation of the plane)?

We can also interpret the equation geometrically as follows: the equation $ax+by+cz=0$ is equivalent to ${\bf n} \cdot {\bf x} = 0$, where ${\bf n} = (a,b,c)$ and ${\bf x} = (x,y,z)$. Note that here, ${\bf n}$ and ${\bf x}$ are vectors, and their dot product is a scalar, so the $0$ on the right is a scalar zero, not the zero vector ${\bf 0} = (0,0,0)$.

As mentioned above, the dot product of two vectors is zero if and only if the vectors are perpendicular. Therefore, this equation is saying that any vector ${\bf x}$ that is perpendicular to ${\bf n}$ is on the plane. For this reason, ${\bf n}$ is called the plane's normal vector (normal is a synonym for perpendicular, as is orthogonal, which is also used frequently). In the example above where $c=0$, the plane's normal vector is $(a,b,0)$, which lies in the $xy$-plane. Thus, the $z$-axis, being orthogonal to the $xy$-plane, is contained in our plane.

Now, in order to answer the geometric question of which vectors are orthogonal to ${\bf n}$, we can look at the algebraic equation $ax+by+cz=0$.

Since the plane does not touch the axes except at the origin, we must have $a,b,c \neq 0$. As an example, let's look at the case where $a,b,c>0$. Then we can have the following combinations for $(x,y,z)$ in order to have $ax+by+cz=0$:
$$(+,+,-) \\
(+,-,+) \\
(+,-,-) \\
(-,-,+) \\
(-,+,-) \\
(-,+,+)$$ The remaining two combinations, $(+,+,+)$ and $(-,-,-)$, do not work, because then the left side of the equation would have to be positive or negative (respectively) and thus not zero.

If we grind through the algebra of the other 7 combinations for $(a,b,c)$, we see that we get 6 possibilities each time, so the plane intersects 6 of the 8 octants, and we have the answer to the problem. I'm not going to go through all the cases, because that would be quite boring, but you can see that there is a certain symmetry in the plane's equation between $(a,b,c)$ and $(x,y,z)$. Once you've solved it for the case $a,b,c>0$, you've pretty much solved it for all the cases. Can you see why? So we've got the answer- it's 6.

The Hyperplane in ${\Bbb R}^{n}$

${\Bbb R}^{n}$ is the $n$-dimensional analog of ${\Bbb R}^{3}$ and is the set of ordered $n$-tuples of real numbers: ${\Bbb R}^{n} =
\ \colon \
{\scr each} \ x_i \in {\Bbb R}
\}$. We can't picture this $n$-dimensional space, but we can use the same types of algebraic equations that work in ${\Bbb R}^{3}$ to analyze ${\Bbb R}^{n}$.

${\Bbb R}^{n}$ is divided into $2^n$ orthants, also known as hyperoctants or $n$-hyperoctants, based on the signs, positive or negative, of the $n$ components of a point. A 2-hyperoctant is a quadrant in ${\Bbb R}^{2}$ and a 3-hyperoctant is an octant in ${\Bbb R}^{3}$. The $x_i$-axis is the set of points where all coordinates except possibly the $i^{\scr th}$ are zero.

A hyperplane in ${\Bbb R}^{n}$ is a set $P$ of points (equivalently, vectors) that are orthogonal to a normal vector ${\bf n} = (a_1, a_2, ... a_n)$. In symbols, $P = \{ {\bf x} \in {\Bbb R}^{n} \ \colon \ {\bf n} \cdot {\bf x} = 0 \}$. For those familiar with linear algebra, the hyperplane containing the origin is a vector subspace of ${\Bbb R}^{n}$ of dimension $n-1$, i.e. co-dimension 1. A hyperplane in ${\Bbb R}^{2}$ is a line, and a hyperplane in ${\Bbb R}^{3}$ is a plane.

How many $n$-hyperoctants does a hyperplane $P \subset {\Bbb R}^{n}$ intersect, given that it contains the origin, but does not intersect the axes at any other point?

To answer this question, we can use the discussion above from the $n=2$ and $n=3$ cases and generalize the results. We can then prove the answer is correct using induction.

When we went from $n=2$ to $n=3$, we took the equation $ax+by=0$ (i.e. the line in ${\Bbb R}^{2}$ with normal vector $(a,b)$), extended it into 3-dimensional space to make the plane whose normal vector is $(a,b,0)$, and then added a non-zero third coordinate to the normal vector to "tilt" the plane off of the $z$-axis.

Now, a hyperplane (including the line and plane in the $n=2$ and $n=3$ cases) is orthogonal to its normal vector ${\bf n}$ as well as the negative of the normal vector, $-{\bf n}$. In fact, the hyperplane is orthogonal to any scalar multiple of ${\bf n}$, but my point in mentioning $-{\bf n}$ is that the hyperplane won't intersect the $n$-hyperoctants that contains ${\bf n}$ or $-{\bf n}$.

Let's look at the case where ${\bf n}$ lies in the first $n$-hyperoctant, i.e. has all positive coordinates. As mentioned above, the other cases are pretty much the same because of the symmetries of the equation ${\bf n} \cdot {\bf x} = \sum_{i=1}^{n}{a_i x_i} = 0$, so the number of $n$-hyperoctants the hyperplane intersects is the same in all cases. In the case that the $a_i$ are positive, the hyperplane doesn't intersect the first $n$-hyperoctant or the one with all negative coordinates (whatever number we want to assign to that one).

In the ${\Bbb R}^{2}$ case, the line intersects quadrants 2 and 4. When we extended ${\bf n} = (a,b)$ to ${\bf n} = (a,b,0)$ in ${\Bbb R}^{3}$, we got a plane that contained the entire $z$-axis. The intersection of this plane with the $xy$-plane is the line $ax+by=0$, which remains the case regardless of the third coordinate of ${\bf n}$. Now, this plane intersects the octants $(+,-,+)$, $(+,-,-)$, $(-,+,+)$, and $(-,+,-)$. We took the original quadrants 2 and 4 and multiplied them by 2 to get 4 octants.

When we add a non-zero third coordinate to ${\bf n}$ (let's assume it's positive), the new plane also intersects two additional octants: $(+,+,-)$ and $(-,-,+)$. The first two coordinates of these two would have not been included in the 2-d case, but the third coordinate allows us to use those combinations and still get the equation $ax+by+cz$ to equal zero. $(+,+,+)$ and $(-,-,-)$ still don't work though.

The same logic works when going from ${\Bbb R}^{n}$ to ${\Bbb R}^{n+1}$ when $n>2$, and we can prove it by induction.

Thoerem: For $n \geq 2$, a hyperplane in ${\Bbb R}^{n}$ containing the origin, but not intersecting the coordinate axes at any other point, intersects $2^{n}-2$ $n$-hyperoctants.

The proof is a bit lengthy, but basically just formalizes the idea of extending the line in ${\Bbb R}^{2}$ into a plane in ${\Bbb R}^{3}$ and then tilting it off the $z$-axis

Proof: The base case of $n=2$ was already shown above.

For the induction step, assume the theorem is true for ${\Bbb R}^{n-1}$, and consider a hyperplane $P = \{{\bf x} \in {\Bbb R}^{n} \ \colon \ {\bf n} \cdot {\bf x}=0 \}$ where ${\bf n} = (a_1,a_2,...,a_n)$.

The equation of $P$ is $\sum_{i=1}^{n}{a_i x_i} = \sum_{i=1}^{n-1}{a_i x_i} + a_n x_n = 0$. By the induction hypothesis, the solutions to the equation $\sum_{i=1}^{n-1}{a_i x_i} = 0$ intersect $2^{n-1}-2$ $(n-1)$-hyperoctants. Let's call those solutions $P_{n-1}$, which is a hyperplane in ${\Bbb R}^{n-1}$

Take a point ${\bf x}_{0, n-1} = (x_{0,1}, x_{0,2},...,x_{0,n-1}) \in {\Bbb R}^{n-1}$ which satisfies the equation of $P_{n-1}$. If $x_{0,1}>0$, then $x_{0,1}+\epsilon>0$ as well, where $\epsilon = \frac{1}{2}|x_{0,1}|$. Similarly, if $x_{0,1}<0$, then $x_{0,1}+\epsilon<0$ as well, so the point ${\bf x}_{1,n-1} = (x_{0,1}+\epsilon, x_{0,2},...,x_{0,n-1})$ is in the same $(n-1)$-hyperoctant as ${\bf x}_{0, n-1}$. By a similar argument, so is the point ${\bf x}_{2,n-1} = (x_{0,1}-\epsilon, x_{0,2},...,x_{0,n-1})$.

Define the points ${\bf x}_{1} = (x_{0,1}+\epsilon, x_{0,2},..., x_{0,n-1}, -\frac{a_1}{a_n}\epsilon), \ {\bf x}_{2} = (x_{0,1}-\epsilon, x_{0,2},..., x_{0,n-1}, \frac{a_1}{a_n}\epsilon) \in {\Bbb R}^{n}$. Then $$
{\bf n} \cdot {\bf x}_{1}
&= a_1 (x_{0,1}+\epsilon) + a_2 x_{0,2} + ... + a_{n-1} x_{0,n-1} + a_n (-\frac{a_1}{a_n}\epsilon) \\[2mm]
&= a_1 (x_{0,1}+\epsilon -\epsilon) + a_2 x_{0,2} + ... + a_{n-1} x_{0,n-1} \\[2mm]
&= a_1 x_{0,1} + a_2 x_{0,2} + ... + a_{n-1} x_{0,n-1} = 0
with the final equality being true because ${\bf x}_{0,n-1} \in P_{n-1}$.

This shows that ${\bf x}_{1} \in P$. Similarly, ${\bf x}_{2} \in P$. The first $n-1$ coordinates of these two points are in the same $(n-1)$-hyperoctant as ${\bf x}_{0,n-1}$, and the $n^{\scr th}$ coordinates of ${\bf x}_{1}$ and ${\bf x}_{2}$ have opposite sign. This shows that we have kept the $2^{n-1}-2$ $(n-1)$-hyperoctants of the $(n-1)$-hyperplane when we extended it into ${\Bbb R}^{n}$ and actually multiplied them by 2 (by adding both positive and negative $n^{\scr th}$ coordinates) to get $2(2^{n-1}-2)$ = $2^{n}- 4$ $n$-hyperoctants.

We just need to show that we've also added two more $n$-hyperoctants. These are the ones where the first $n-1$ coordinates all have the same sign or all have the opposite sign as the first $n-1$ coordinates of ${\bf n}$, just like when we went from $n=2$ to $n=3$ above. Examples of solutions to the equation of $P$ that are in those 2 $n$-hyperoctants are $(a_1,a_2,...,a_{n-1},-\dfrac{1}{a_n}\sum_{i=1}^{n-1}{a_i^2})$ and $(-a_1,-a_2,...,-a_{n-1},\dfrac{1}{a_n}\sum_{i=1}^{n-1}{a_i^2})$.

So now we are up to $2^{n}-4+2 = 2^{n}-2$, so $P$ intersects at least that many $n$-hyperoctants. There are only 2 more $n$-hyperoctants, and those are the ones that contain $\pm {\bf n}$, but we already know that points in those $n$-hyperoctants cannot satisfy the equation of ${\bf n} \cdot {\bf x} = 0$, so $P$ intersects exactly $2^{n}-2$ of the $2^n$ $n$-hyperoctants, and the theorem is proved.

Here's a diagram illustrating the objects described in the proof in the case where $n=3$ and $n-1=2$. Apologies for the low quality (I made it in MS Paint), but note that the bottom of the red plane, $P$, comes out towards the viewer, in front of the blue plane, and the top half of $P$ is behind the blue plane. The points ${\bf x}_{0,n-1}$, ${\bf x}_{1,n-1}$, and ${\bf x}_{2,n-1}$ are in in the $x_{1}x_{2}$-plane, with ${\bf x}_{1,n-1}$ in front of $P$ and ${\bf x}_{2,n-1}$ behind $P$.

Thanks for reading. Post any questions in the comments section.


The natural numbers, denoted ${\Bbb N}$, are the counting numbers $\{ 0, 1, 2, 3, ... \}$.

If a statement about the natural numbers is true for some base case $n = n_0$ (usually $n_0 = 0$ or $n_0 = 1$), and if we can prove that if the statement is true for $n-1$, it is also true for $n$, then the statement is true for all natural numbers. This is the axiom of induction.

Basically, the axiom of induction is like dominoes- if the first domino falls, and if domino $(n-1)$'s falling knocks over domino $n$, then all the dominoes will fall.

This can be a powerful proof technique for statements about the natural numbers.

Here's an example:

Theorem: The sum of the natural numbers up to and including $n$, $\sum_{i=0}^{n}{i} = 0+1+2+...+n$, is equal to $\dfrac{n(n+1)}{2}$.

Proof: The base case where $n=1$ is true because $0+1 = 1 = \dfrac{2}{2} = \dfrac {(1)(2)}{2}$. For the induction step, assume (this assumption is called the induction hypothesis) that the statement is true for all natural numbers up to $n-1$. We need to prove it's true for $n$ as well.

Using the induction hypothesis after the second $=$ sign, we get: $$
\sum_{i=0}^{n}{i} = \sum_{i=0}^{n-1}{i} + n = \dfrac{(n-1)(n)}{2} + n = \dfrac{(n-1)(n)}{2} + \dfrac{2n}{2} = \\[5mm]
\dfrac{(n-1)(n)+2n}{2} = \dfrac{(n-1+2)(n)}{2} = \dfrac{n(n+1)}{2}
$$ That proves the statement for $n$, so the theorem is proved for all natural numbers by the induction axiom.

Euclidean Space and Vectors

${\Bbb R}^{n}$

Imagine the number line floating in space, and at 0, we put another copy of the number line at a 90 degree angle to the first. These are the $x$- and $y$-axes, and together, they make a plane (a flat sheet that goes on forever in all directions). Any point on the plane can be identified by how far out it is on the $x$-axis and then how far out on the $y$-axis. The two numbers describing the point's position are called the point's coordinates, and the plane is called the $xy$-plane. The set of points on the number line, for example 4 or 3.78123947321 or $\sqrt{2} \approx 1.41421356237$, is denoted by the symbol ${\Bbb R}$. If you are not familiar with sets and the notations for describing them, check out the post on sets.

A point in the $xy$-plane with coordinates $x_0$ and $y_0$ is written as an ordered pair $(x_0,y_0)$. The set of ordered pairs of real numbers is the Cartesian Product of ${\Bbb R}$ with itself, i.e. ${\Bbb R} \times {\Bbb R}$ and is also written ${\Bbb R}^{2}$. So ${\Bbb R}^{2}$ is the plane.

Now imagine we add a third copy of the number line to our $xy$-plane at the point $(0,0)$ (i.e. where the $x$- and $y$-axes cross), this time perpendicular to the plane. This is the $z$-axis, and now we have a 3-dimensional space where a point is identified by an ordered triple $(x,y,z)$. The set of such points is called ${\Bbb R}^{3}$ and is just ordinary 3-dimensional space.

If $n$ is an integer greater than 3, then we could consider the set of ordered n-tuples $(x_1, x_2, x_3, ... , x_n)$. This is ${\Bbb R}^{n}$ or $n$-dimensional space. We can't picture this space, but we can do the same math with it that we can do with ${\Bbb R}^{2}$ and ${\Bbb R}^{3}$, so it has many applications. Let's focus on ${\Bbb R}^{3}$ for now.

The $x$-, $y$-, and $z$-axes divide 3-dimensional space into 8 regions, called octants. For example, one octant consists of points whose $x$-, $y$-, and $z$-coordinates are all positive. Another octant contains points whose $x$-coordinate is positive, but whose $y$- and $z$-coordinates are negative. And so on. There are 3 coordinates and 2 possibilities (positive or negative) for each coordinate, thus $2^3 = 8$ regions. Points with one coordinate whose value is equal to 0 lie in one of the coordinate planes ($xy$-plane, $xz$-plane, or $yz$-plane). Points with two 0 coordinates lie on one of the axes, and the point with all coordinates equal to 0 is called the origin.

The distance between two points $\textbf{x}_{1}=(x_1,y_1,z_1)$ and $\textbf{x}_{2}=(x_2,y_2,z_2)$ is the length of the straight line segment between them. The formula for this distance (the distance formula) is $d(\textbf{x}_{1},\textbf{x}_{2})=\sqrt{(x_2-x_1)^2+(y_2-y_1)^2+(z_2-z_1)^2}$, which can be worked out by using the Pythagorean Theorem a few times. There could be alternative ways of measuring distance under which a straight line would not be the shortest distance between two points (e.g. measuring distance on the surface of a sphere which sits in ${\Bbb R}^{3}$), but let's stick with the straight-line distance. ${\Bbb R}^{3}$ endowed with this metric for distance is known as (3-dimensional) Euclidean space.


A point in ${\Bbb R}^{2}$ or ${\Bbb R}^{3}$ can also be thought of as a vector, which is an arrow starting at the origin and terminating at the point in question. The origin is the zero vector, denoted $\vec{0}$ or $\bf{0}$. A vector has a magnitude, which is its length, and a direction in which it points. So for example, you could have a vector that is 30 degrees from the $x$-axis, 45 degrees from the $xy$-plane in the positive $z$ direction (i.e. up), and with length 5. This vector would terminate at the point $(\frac{5\sqrt{6}}{4},\frac{5\sqrt{2}}{4},\frac{5\sqrt{2}}{2}) \approx (3.06,1.77,3.54)$. Can you figure out how I got those coordinates from the description of the angles and length?

The 3 coordinates specify the direction (which can also be equated with angles from the axes using some sines and cosines), and the magnitude of the vector can be worked out using the distance formula. For a vector $\vec{x} = (x,y,z)$, the magnitude is $\|\vec{x}\| = \sqrt{x^{2}+y^{2}+z^{2}}$. This is a direct consequence of the distance formula mentioned above. Note that in addition to $\vec{x}$, we often see vectors written as $\bf{x}$ or $\vec{\bf{x}}$.

In physics, vectors are used to represent, among other things, an object's velocity, which has a magnitude (speed) and direction that changes over time.

Operations with Vectors


In the context of vectors of ${\Bbb R}^{n}$, a regular number $c \in {\Bbb R}$ is called a scalar (since it scales a vector).

Given a vector ${\bf x} = (x,y,z)$ (note that $x$, $y$, and $z$ are real numbers as stated above) and a scalar $c$, we can define a new vector $c {\bf x}$ by $c{\bf x} = (cx,cy,cz)$. This operation on a vector is called scalar multiplication. A vector whose magnitude is 1 is called a unit vector. To make a unit vector in the same direction as a vector ${\bf x}$, simply scalar multiply ${\bf x}$ by the number $\frac{1}{\| {\bf x} \|}$.

We can also add two vectors ${\bf x} = (x_1,x_2,x_3)$ and ${\bf y} = (y_1,y_2,y_3)$ by defining the new vector ${\bf x}+{\bf y}$ to be the vector whose coordinates are the sum of the coordinates of ${\bf x}$ and ${\bf y}$, i.e. ${\bf x}+{\bf y} = (x_1+y_1,x_2+y_2,x_3+y_3)$. This is called vector addition.

The scalar multiplication and vector addition defined above satisfy the following eight properties (which can be proved easily using the definitions) called the vector space axioms. In the following, boldface symbols represent vectors in ${\Bbb R}^{3}$ and italic symbols and non-boldface numbers represent scalars, i.e. elements of ${\Bbb R}$.

Associativity of vector addition: 
$\ \ \ \ \ \ \ \ {\bf x} + ({\bf y} + {\bf z}) = ({\bf x} + {\bf y}) +{\bf z}$
Commutativity of vector addition:
$\ \ \ \ \ \ \ \ {\bf x} + {\bf y} = {\bf y} + {\bf x}$
Additive identity: for all vectors ${\bf x}$, the zero vector ${\bf 0} = (0,0,0)$ has the property that $\ \ \ \ \ \ \ \ {\bf 0} + {\bf x} = {\bf x}$
Additive inverse: for a vector ${\bf x} = (x_1,x_2,x_3)$, the vector $-{\bf x} = (-x_1,-x_2,-x_3)$ has the property that
$\ \ \ \ \ \ \ \ -{\bf x} + {\bf x} = {\bf 0}$
Compatibility of scalar multiplication and field multiplication:
$\ \ \ \ \ \ \ \ a(b{\bf x}) = (ab){\bf x}$
Identity element of scalar multiplication:
$\ \ \ \ \ \ \ \ 1{\bf x} = {\bf x}$
Distributivity of scalar multiplication with respect to vector addition:
$\ \ \ \ \ \ \ \ a({\bf x} + {\bf y}) = a{\bf x} + a{\bf y}$
Distributivity of scalar multiplication with respect to field addition:
$\ \ \ \ \ \ \ \ (a+b){\bf x} = a{\bf x} + b{\bf x}$

Note: these properties inspire the definition of a vector space as any set (whose elements we call vectors) and a set of numbers (from a field, whose elements we call scalars, usually the real numbers ${\Bbb R}$ or the complex numbers ${\Bbb C}$) with a defintion of scalar multiplication and vector addition that satisfy the axioms above. For example, the set of functions $f$ mapping real numbers $x$ to real numbers $f(x)$ are a vector space with the real numbers as scalars if we define scalar multiplication by $(af)(x) = af(x)$ and vector addition by $(f+g)(x) = f(x) + g(x)$. You can prove that these definitions satisfy the axioms as an exercise if you want.

There are two more important vector operations that allow us to "multiply" two vectors in ${\Bbb R}^{3}$.

The dot product (also known as the scalar product or inner product) of two vectors produces a scalar: ${\bf x} \cdot {\bf y} = x_1 y_1 + x_2 y_2 + x_3 y_3$. This is also defined in ${\Bbb R}^{n}$ when $n>3$ in the same way, but with $n$ summands instead of 3: $${\bf x} \cdot {\bf y} = \sum_{i=1}^{n}{x_i y_i}$$ The dot product is related to the angle $\theta$ between two vectors by the formula ${\bf x} \cdot {\bf y} = \| {\bf x} \| \| {\bf y} \| \cos \theta$. So the dot product is maximized when the vectors point in the same direction, zero when they are perpendicular, and minimized (negative) when the vectors point in opposite directions.

The cross product is only defined in ${\Bbb R}^{3}$ and produces a new vector whose direction is perpendicular to both of the two input vectors and whose magnitude is equal to the area of the parallelogram spanned by the two vectors, which is $\| {\bf x} \| \| {\bf y} \| \sin \theta$. The definition of the cross product is ${\bf x} \times {\bf y} = (x_2 y_3 - y_2 x_3, - x_1 y_3 + x_3 y_1, x_1 y_2 - x_2 y_1)$.

To obtain a vector perpendicular (also known as normal) to two vectors, we can take their cross product, and we can always make it into a unit normal vector by dividing by (i.e. scalar multiplying by 1 divided by) the magnitude of the cross product.

Equivalence Relations

Prerequisites: Sets and Important Notations

Equivalence relations are an important concept that will be needed in some later posts. The good news is that they are relatively easy to understand.

For the rest of this post, let $A$ be a set.

A binary relation on $A$ is a set $R$ of ordered pairs of elements of $A$, i.e. $\{(x,y) \in A \times A \ \colon \ ...{\scr some~condition~on~}x {\scr~and~}y \}$ ($A \times A$ is the set of ordered pairs of elements of $A$). If $(x,y) \in R$, we write $x \sim y$ or sometimes $x \equiv y$. Confused? It's easier to understand with an example- I'll present two, one of which will turn out to be an equivalence relation while the other will not.

Let $P$ be the set of all living people on earth and for $x,y \in P$, say $x \sim y$ if $x$ and $y$ have the same birthday. This will turn out to be an equivalence relation on $P$.

Here's another example that is not an equivalence relation (you'll see why below). Let $\Bbb R$ be the set of real numbers, i.e. the number line. Don't worry about the precise definition- there will be a few posts dedicated to that later. For now, just think of $\Bbb R$ as the number line which we all know and love since first grade. Then $\leq$ is a binary relation on $\Bbb R$. If we wanted to use the set formulation above, we could call the relation $L = \{ (x,y) \in {\Bbb R} \times {\Bbb R} \ \colon \ x \leq y \}$. Note that the order of $x$ and $y$ clearly matter here, since $5 \leq 7$, but $7 \nleq 5$. So it's important that a binary relation is a set of ordered pairs of elements of our set.

Now for the main event.

An equivalence relation on $A$ is a binary relation $\sim$ on $A$ that satisfies the following three properties for any $x,y,z \in A$:
(1) $x \sim x$ (reflexivity)
(2) If $x \sim y$, then $y \sim x$ as well (symmetry), and
(3) If $x \sim y$ and $y \sim z$, then $x \sim z$ (transitivity).

The birthday relation above clearly satisfies these properties and is thus an equivalence relation on the set $P$ of people. The relation $\leq$ on $\Bbb R$ is not an equivalence relation because, while it does satisfy (1) and (3), it does not satisfy (2), since $5 \leq 7$, but $7 \nleq 5$.

An equivalence relation partitions the set into equivalence classes. For $x \in A$, the equivalence class of $x$, denoted $[x]$, is defined as the set of elements of $A$ that are equivalent to $x$, i.e. $[x] = \{ y \in A~\colon~y \sim x \}$.

The set of all equivalence classes of $A$ under $\sim$ is called the quotient set of $A$ by $\sim$ and is denoted $A/{\sim}$. $A$ is indeed partitioned by $\sim$ in the sense that:
$$\biguplus_{C \in A/{\sim}} C = A$$
This means that, first of all, the equivalence classes $[x]$ (remember, the $C$'s above are elements of $A/{\sim}$, i.e. equivalence classes $[x]$) cover all of $A$ (that is, $\forall~y \in A,~\exists~x \in A {\scr~such~that~} y \in [x]$. In fact, this $x$ is just $y$ itself, because $y$ is always in $[y]$. Can you see why from the definition of equivalence class?). Notice the little $+$ inside the union symbol? This is the other part of partition thing, and it means that the sets being "unioned" are disjoint- they do not have any overlap. To be precise, "no overlap" means that their intersection is empty: $\forall~x,y, \in A$ with $x \nsim y$, $[x] \cap [y] = \emptyset$.

So the equivalence classes (this is, the elements of the set $A/{\sim}$) cover all of $A$ in the sense that each element of $A$ is in one of them, and they do not overlap. We can actually prove the latter fact using the definition of an equivalence relation:

Let $x,y \in A$ with $x \nsim y$, and assume that $[x] \cap [y] \neq \emptyset$, i.e. $\exists~z \in A$ such that $z \in [x] \cap [y]$ (i.e. $z$ is in both $[x]$ and $[y]$). We are going to prove that this leads to a contradiction. $z \in [x]$ means that $z \sim x$. The symmetry property (2) tells us it's also true that $x \sim z$. Similarly, $z \in [y]$ means that $z \sim y$. Since $x \sim z$ and $z \sim y$, the transitive property (3) of equivalence relations implies that $x \sim y$. But that contradicts the assumption that $x \nsim y$. Thus, there cannot exist such a $z$, and the equivalence classes $[x]$ and $[y]$ do not overlap. So it was legit for us to use the $\biguplus$ symbol above, and an equivalence relation indeed constitutes a partition of $A$.

In fact, any partition also defines an equivalence relation. Suppose we have a collection $\{X_i\}_{i \in I}$ of subsets of $A$, indexed by some index set $I$ (an index set is just a set of labels for a collection of other things, in this case subsets of $A$- usually you use one when you aren't sure whether there will be a countable or uncountable number of items in your collection that need a label, in which case you can't just number the items) such that $A = \biguplus_{i \in I} X_i$. Then the $X_i$'s define an obvious equivalence relation on $A$ by $x \sim y$ if there exists a single $i \in I$ such that $x,y \in X_i$.

Sorry that this post is not that exciting, but equivalence relations are an important concept which I promise will be used in at least one cool post later. To liven things up a bit, there is a surprise, and that is that during the discussion above, we actually proved the so-called Fundamental Theorem of Equivalence Relations, which states that given a set $A$ and an equivalence relation $\sim$ on $A$,
(1) $\sim$ partitions $A$ in the sense described above, and
(2) every partition of $A$ defines an equivalence relation on $A$.

Thanks for reading, and feel free to post any questions in the comments section.

Sets and Important Notations

A set is a collection of distinct objects called elements. Any types of objects can be elements of a set, including numbers, colors, the batman symbol, as well as other sets. The set is completely determined by the elements contained in it.

There are numerous ways to describe a set. The simplest is to list the elements; if $A$ is the set of colors of the American flag, we could write $A = \{\scr red, white, blue\}$. The order that we write the elements in is irrelevant. Only the elements themselves matter.

As another example, if $B$ is the set of integers 1 through 100, we could write $B=\{1,2,...,100\}$.

For more complicated sets, we can use the so-called set-builder notation, which has its own section below.

Membership, Subsets, and Equality of Sets


If an object $x$ is an element of the set $A$, we write $x \in A$. Otherwise, $x \notin A$, i.e. $x$ is not an element of $A$.

Using the examples above, $7 \in B$, ${\scr green} \notin A$, and $100.3 \notin B$.

If we have some sets $C$ and $D$, and $C$ is completely contained in $D$ (in other words, every element of $C$ is also an element of $D$), then we say $C$ is a subset of $D$, and we write $C \subseteq D$, analagous to the $\leq$ symbol for numbers. Sometimes, you will also see it written as $C \subset D$, which I usually reserve for a proper subset, $C$ is a proper subset of $D$ if it is contained in, but not equal to, $D$, i.e. there is at least one element of $D$ that is not in $C$. A proper subset relationship is sometimes also written as $C \subsetneq D$.

Note that the empty set, the set containing no elements and written either as $\{\}$  or $\emptyset$, is a subset of every set. This is because you can make any statement you want about "all elements of the empty set"- there are none, so the statement is automatically true (but also doesn't really tell us anything). In particular, for some set $A$, every element of the empty set is also an element of $A$. For every pig in $\emptyset$, the pig can fly. Get it?

One special subset of a set $A$ is the power set of $A$, denoted $2^{A}$ or ${\cal P} (A)$, whose elements are all the subsets of $A$. Using the example set $A$ above,
2^{A} =
\{ {\scr red} \},
\{ {\scr white} \},
\{ {\scr blue} \},
&\{ {\scr red, white} \},
\{ {\scr red, blue} \},
\{ {\scr white, blue} \},
\{ {\scr red, white, blue} \}
$$ $2^{A}$ always contains $\emptyset$ and $A$ itself. In this example, $A$ has 3 elements, and $2^{A}$ has $2^3 = 8$ elements. This is always the case for sets with a finite number of elements, hence the notation $2^{A}$. I kind of prefer the notation ${\cal P} (A)$, but both are common.

Finally, if $C \subseteq D$ and $D \subseteq C$, then $C$ and $D$ have the same elements and so are the same set. We write $C=D$.

Set-builder Notation 


Before going into the set-builder notation, there are two more symbols that will be useful.

The $\forall$ symbol stands for "for all" or "for each." So for the example set $B$ above, we could say that $\forall x \in B, 1 \leq x \leq 100$.

The symbol $\exists$ stands for "there exists." Using example set $B$ again, we can write $\forall x \in B, x \neq 100, \exists y \in B$ with $y > x$. This means that for each element $x$ of $B$ (except for 100), there exists another element $y$ of $B$ where $y>x$. Note there $\exists$ does not imply that only one such element exists. In this case, if $x=3$, examples of $y$ fitting the criteria above would be $4, 5, 6, ... , 100$.

In fact, this is a good example to illustrate the set-builder notation. We enclose in curly brackets first a variable for the elements of the set, then (separated by a colon $\colon$ or vertical bar $\vert$ ) a logical predicate which describes which elements are in the set.

Going back to the example above, if $\Psi_{x}$ is the set of elements $y$ of $B$ that are greater than a specified element $x$ (note that in this example, there would be one such set $\Psi_{x}$ for each $x \in B$), then we could write $\Psi_{x} = \{ y \in B \ \vert \ y > x \}$ or $\Psi_{x} = \{ y \in B \ \colon y > x \}$. So $\Psi_{3}$ would be $\{4,5,6,...,100\}$.

In set-builder notation, the power set can be specified as ${\cal P} (A) = \{ S \ \colon S \subseteq A\}$.

Basic Set Operations, De Morgan's Laws


There are three basic set operations, union, intersection, and complement.

The union of $A$ and $B$, denoted $A \cup B$, contains any element of $A$ or $B$ (including elements in both $A$ and $B$). In symbols, $A \cup B = \{ x \ \colon x \in A \ {\scr or}\  x \in B \}$.
Some properties of unions:
$A \cup B = B \cup A$
$A \cup (B \cup C) = (A \cup B) \cup C$
$A \subseteq A \cup B$
$A \cup A = A$
$A \cup \emptyset = A$
$A \subseteq B {\scr \ if \ and \ only \ if \ } A \cup B = B$

The intersection of $A$ and $B$ is denoted $A \cap B$ and contains the elements common to $A$ and $B$, i.e. $A \cap B = \{ x \ \colon x \in A \ {\scr and}\  x \in B \}$.
Some properties of intersections:
$A \cap B = B \cap A$
$A \cap (B \cap C) = (A \cap B) \cap C$
$A \cap B \subseteq A$
$A \cap A = A$
$A \cap \emptyset = \emptyset$
$A \subseteq B {\scr \ if \ and \ only \ if \ } A \cap B = A$

The complement of $A$, denoted $A^{C}$ or $A^{\prime}$,  contains all objects that are not elements of $A$, i.e. $A^{C} = \{x : x \notin A \}$.

The relative complement, denoted $A \setminus B$ or sometimes $A - B$, is the set of elements of $A$ not contained in $B$. $A \setminus B = \{ x \in A \colon x \notin B \}$. Note that $A \setminus B = A \cap B^{C}$.

Some properties of complements:
$A \setminus B \neq B \setminus A$ for $A \neq B$
$A \cup A^{C} = U$ where $U$ is the universe, i.e. everything
$A \cap A^{C} = \emptyset$
$(A^{C})^{C} = A$
$\emptyset ^{C} = U$ and $U^{C} = \emptyset$

There are two useful properties known as De Morgan's Laws that combine the operations above:
$(A \cup B)^{C} = A^{C} \cap B^{C}$, and
$(A \cap B)^{C} = A^{C} \cup B^{C}$.
If you picture a Venn Diagram like the ones above, it's easy to see why these are true. The first states that if something is not in $A$ or $B$, then it's not in $A$ and it's not in $B$, and vice versa. The second states that if something is not in both $A$ and $B$, then we know it's either not in $A$ or not in $B$ (or both), and vice versa.

Finally, the Cartesian product of two sets $A$ and $B$ is the set $A \times B$ whose elements are ordered pairs of elements of $A$ and $B$. $$A \times B = \{(a,b)~\colon~a \in A, b \in B \}$$

Sometimes, $A \times A$ is written as $A^{2}$, and similary $A^{3}$ would be $A \times A \times A$, etc. We will refer to the Cartesian product in some later posts.

Post any questions in the comments section, and I'll answer them as soon as possible.