Powered by Blogger.

Stars and Bars

This post comes in response to the very first reader request:

My request is for a topic on object set problems that come up often on the GRE: permutations and combinations. I know the basics, like combination = order doesn't matter & permutation = order does matter, and the former is reduced in number from the latter. Where it gets dicey for me is when it comes to situations of repetition vs. no repetition.
 

So, for example, with a combination with repetition (a store sells shirts that come in 6 different colors and I have a coupon to get 3 shirts of whatever color, how many variations of that can I make?) what is the best way to solve for that when you can have two or more colors be the same? I know there are formulas for each scenario that you can use, but I guess I'm curious about the basic proof behind the formulas (especially the combination with repetition).

Indeed, there is a formula for this exact scenario, but as you mentioned, let's use the question from the request above to illustrate the thought processes required to solve this problem in a few different ways, and then derive the general formula as a result.

The question here is how many different sets of 3 shirts we can make given 6 choices of shirt color. If we were picking one shirt for Monday, one for Tuesday, and one for Wednesday (i.e. if order mattered), then there would be $6^3 = 216$ choices, since we can choose from a set of 6, 3 times. But since the order does not matter in this question, this would be double-counting a lot of different combos: if we make 3 sequential selections from the pool of 6 colors, there are 3 which result in, for example, 2 blue shirts and 1 red shirt (R-B-B, B-R-B, and B-B-R). And there would be $6=3!$ (that's 3 factorial) different ways to end up with the set B,Y,G: B-Y-G, B-G-Y, G-B-Y, G-Y-B, Y-G-B, and Y-B-G.

As we can see from the two examples above, starting with $6^3$ and then subtracting out the repetitions is a bit tricky because the number of repetitions actually depends on which colors we choose (more precisely, on how many different colors end up in our set of 3). If each shirt is the same color, there are no repetitions to count; if all 3 shirts are different colors, there are 6 repetitions. And if we choose 1 of one color and 2 of another, there are 3 repetitions. We need to consider these cases separately in order to correctly count all the possibilities.

Case 1: 3 shirts of 3 different colors
In this case, we just need to choose 3 colors from the 6 possible ones, and then our set is completely determined, so there are $6 \choose 3$ possibilities, where $6 \choose 3$ is the binomial coefficient, calculated as $\dfrac{6!}{3! (6-3)!}$.

Case 2: 2 of one color and 1 of another
In order to choose such a set of 3, we need to first choose two colors from the 6. There are $6 \choose 2$ ways to do that. Next, we need to pick which of the two colors will be the color of 2 shirts and which will be the color of 1. There are 2 options for this, so the number of ways to create a set falling under this case is $2 \times {6 \choose 2}$.

Case 3: 3 shirts of the same color
This is the easiest case- there are clearly 6 ways to choose 3 shirts of all the same color, since all we need to do is choose the color. Note that we can also write this as $6 \choose 1$.

Finally, the answer to this question is $$
\begin{align}
\text{# 3-shirt sets given 6 color choices} &=  \sum_{\text{cases}}{\text{# per case}} \\[3mm]
&= {6 \choose 3} + \left[ 2 \times {6 \choose 2} \right] + {6 \choose 1} \\[3mm]
&= 20 + [2 \times 15] + 6 \\[3mm]
&= 56
\end{align}
$$ We can actually back out the $6^3$ from the case where order matters based on the above by multiplying by the number of repetitions in each case: $6^3 = 216 =  (6 \times 20) + (3 \times 2 \times 15) + 6$.

The general formula: "stars and bars" (or "shirts and bars"...)


In the example above, in the case where we choose 2 colors, say red and blue, for 3 shirts, we need to choose how many of the 3 will be red and how many will be blue. Obviously, there are 2 choices for this: 1-2 and 2-1.

If we had needed to pick, for example, 5 shirts, it would have been a bit more complicated: we could have chosen between 1 and 5 different colors. If we look at the case where we choose 3 different colors (say blue, red, and fuschia) for 5 shirts, we would need to choose how many shirts would be each of the 3 colors; there are 6 ways to do this: 1-1-3, 1-3-1, 3-1-1, 1-2-2, 2-1-2, and 2-2-1. There are also numerous other combinations to count, so this case is a lot harder to count directly than the example above.

Luckily, there is an easy tool called stars and bars that helps us visualize and count in these tougher examples. I'm going to use shirts and bars instead since we're talking about shirts, but it's the same thing.

If we have 5 shirts and 3 colors, we can see in the pictures below that there are 5 shirts, and thus $5-1=4$ slots in which to insert $3-1=2$ bars to divide the shirts into 3 groups to be colored in. The second/third pictures corresponds to option 1-2-2 from above.



This clearly illustrates that if there are 6 colors, and we need to choose 5 shirts, then in the case where we have 3 different colors, the number of possible combinations is: $$
\begin{align}
&\text{ways to choose 3 colors from 6} \times \text{ways to color 5 shirts with 3 colors} \\[3mm]
&={6 \choose 3} \times {{5-1} \choose {3-1}} \\[3mm]
&= 20 \times 6 \\[3mm]
&= 120
\end{align}
$$ Using this logic, we can see that the general formula for the number of ways to choose $k$ shirts from a pool of $n$ colors is the sum of the numbers for each case: $$
\sum_{j=1}^{k}{{n \choose j}{{k-1} \choose {j-1}}} \tag{1}
$$ where $j$ is the number of different colors, ${n \choose j}$ is the number of ways to choose those colors, and ${{k-1} \choose {j-1}}$ is the number of ways to color the $k$ shirts using the $j$ colors, which we got via stars and bars above. Try this formula for case where $n=6$ and $k=3$, and you will see we recover the answer 56 from our first example.

Note that the sum in (1) only goes up to $k$, since we can't select more colors than the number of shirts (here, I'm assuming $n>k$ as in our example). However, if $j>k$, then the number of ways to choose $j$ items from a set of $k$ is 0, so ${{k-1} \choose {j-1}}=0$. This means we can extend the limit of summation up to $n$ without changing the value of the sum. We can also extend the lower limit of summation from 1 to 0, since the $j-1$ factor will zero out the second term when $j=0$ (i.e. there are zero ways to choose a negative number of items from a pool of $k-1$). Note that the same logic holds if $n \leq k$, since then the $n \choose j$ factor becomes zero when $j>n$. Thus, regardless of whether $n>k$ or $n \leq k$, we have: $$
\sum_{j=1}^{k}{{n \choose j}{{k-1} \choose {j-1}}}
=
\sum_{j=0}^{n}{{n \choose j}{{k-1} \choose {j-1}}}
$$ but furthermore: $$
\begin{align}
\sum_{j=0}^{n}{{n \choose j}{{k-1} \choose {j-1}}}
&= \sum_{j=0}^{n}{{n \choose j}{{k-1} \choose {(k-1)-(j-1)}}} \\[3mm]
&= \sum_{j=0}^{n}{{n \choose j}{{k-1} \choose {k-j}}} \\[3mm]
&= {{n+k-1} \choose {k}}
\end{align}
$$ where the first equality is due to the fact that for any positive integers $m,r$ with $r<m$, we have ${m \choose r} = {m \choose {m-r}}$. This just means that selecting $r$ items from a pool of $m$ is equivalent to choosing $m-r$ items not to include in your selection. The third equality is called the Vandermonde Convolution, and if you read the pool handicapping post, you might have guessed that it has both an algebraic and a combinatorial proof. I'll leave them to you to figure out (see the end of this post for a hint on each).

The fact that the sum from before boils down to a single binomial coefficient suggests that there is a simpler solution to our counting problem. Indeed there is, and it's actually another stars and bars method.

Going back to the 6 colors, 3 shirts example, let's draw 3 shirts and place $6-1=5$ bars anywhere to the left or right of any shirt. We are allowed (and in this example, forced) to place two bars next to each other with no shirt in between. Suppose we then color the shirts in a fixed order (e.g. blue, red, green, yellow, fuschia, orange): every shirt to the left of the first bar (if any shirts) will be blue, shirts between the first and second bar would be red, between the second and third green, and so forth. Shirts to the right of the fifth bar would be orange. So the case of 2 blues and 1 red would look like this:
Note that there are $6+3-1$ objects ($6-1$ of them are bars, and 3 are shirts), and choosing a set of 3 shirts from the 6 color options is equivalent to simply choosing where in the line-up of 8 objects the 3 shirts should go. Thus, the number of ways to do this is ${{6+3-1} \choose {3}} = {8 \choose 3} = 56$, the same answer as before.

To recap, you now know two different stars and bars methods which you can apply in combinatorial counting problems. I hope this was informative and thoroughly addressed the request. If any questions remain, feel free to ask in the comments section.

Lastly, here are the hints for the two proofs of the Vandermonde Convolution:

Algebraic proof: Note that ${{n+k-1} \choose {k}}$ is the coefficient of $x^k$ in the expansion of $(1+x)^{n+k-1}$. What is the coefficient of $x^k$ in the expansion of $(1+x)^{n} (1+x)^{k-1}$?

Combinatorial Proof: Suppose we have a bag of $n+k-1$ (numbered) marbles, of which $n$ are blue and $k-1$ are red. How many different sets of $k$ marbles can we create from the bag? How many ways can we choose $k$ marbles from the bag, where $j$ are blue and $k-j$ are red?

Einstein on Brownian Motion


In 1827, botanist Robert Brown observed through his microscope that pollen particles in resting water exhibited motion which he was unable to explain. This motion came to be known as Brownian motion, which now also has a precise mathematical definition as a stochastic (i.e. random) process.

Fast forward to 1905: the scientific community had not yet accepted, as we do today, that matter consists of atoms. Though analysis based on atoms and molecules had proven useful in some applications, many physicists regarded atoms as a hypothetical tool rather than a physical reality. Alternative theories postulated that the structure of the world consisted, for example, of different forms of energy and energy transformations. Albert Einstein's 1905 paper "On the movement of particles suspended in stationary liquids required by the molecular-kinetic theory of heat" (kind of a mouthful, but hey, the guy was German, so what did you expect?) explained Brownian motion as arising from the random bombardments of the pollen particles by moving water molecules, like in the animation below:

Einstein was able to derive formulas for the mean square displacement of a suspended particle as well as Avogadro's number, and when Jean Perrin experimentally verified these results in 1908, the remaining skeptics accepted this as evidence of the physical reality of atoms.

In this post, I will walk you through Einstein's pivotal analysis of Brownian motion.

Note: nothing in this post is new- it's all copied from various sources, including Einstein's paper, Robert E. Kennedy's book on the same (as well as other Einstein papers), and other sources from around the internet. I found it very difficult to find any single source that provided a satisfactory explanation of every step in Einstein's paper, so I hope this post, which has taken me quite a while, achieves that and is clear enough to follow easily without the need to scour the internet for additional information. If not, please post any questions in the comments section!

The setup


We have a container with volume $V$ filled with liquid (the solvent- let's just call it water going forward) and a solute (e.g. sugar) dissolved therein. The container is divided into two sections by a wall through which the water can flow freely, but which is impermeable to the sugar, and the latter is confined to the walled-off section of the container, which has volume $V^*$. The volume of the remainder of the container is then $V-V^*$.

Since the sugar molecules cannot pass through the wall, they bounce back when they hit it and thus exert a pressure on the wall (the water passes right through the wall and thus does not exert a pressure on it). Van 't Hoff's law states that if the number of moles of sugar in the volume $V^*$ is $z$, then this pressure $\Pi$, known as the osmotic pressure, is given by $\Pi V^* = zRT$, where $R$ is the gas constant and $T$ is the temperature in Kelvin. This is the same as the ideal gas law, which is not totally unexpected since the pressure is due to the sugar molecules bouncing around in the volume $V^*$ the same way a gas in a closed container would.

If, instead of sugar, some larger particles are suspended (but not dissolved, the difference lying solely in the particle size) in the water, according to Einstein, the osmotic pressure on the wall should be given by the same formula as that for solutions above; in other words, the osmotic pressure only depends on the number (or more precisely, the concentration, or number per unit volume) of dissolved/suspended particles. The next section shows his derivation based on the "molecular-kinetic theory of heat."

The osmotic pressure due to suspended particles


In the last post, I showed you Einstein's derivation of a formula for the entropy of $n$ particles in the water bouncing around due to random molecular motion and collisions: $$
S = \frac{\bar{E}}{T} + k \ln \left[ \int e^{\frac{-E}{kT}}dp_1 \, dp_2 \, ... \, dp_{3n} \, dq_1 \, dq_2 \, ... \, dq_{3n} \right] \tag{1}
$$ Here, $\bar E$ is the total energy of the $n$ particles, $T$ is the absolute temperature in Kelvin, $k$ is the Boltzmann constant (equal to the gas constant $R$ divided by Avogadro's number $N$), and the $p_i$'s and $q_i$'s are the state variables of the system of $n$ particles, representing the 3 components of momentum and position of each particle, hence the indices' running up to $3n$. The integral is taken over the different configurations of the system (i.e. different possible values of position and momentum of the particles). The $E$ inside the integral is the energy of the system as a function of the state variables.

The free energy (Helmholtz free energy) of the system is defined as $F = E - TS \ \ \text{(2)}$. If you read the Wikipedia article on Helmholtz free energy, you can see the derivation of the formula $dF = -S \, dT - P \, dV$ via basic calculus (in particular, the product rule for differentiation). From this formula, it's clear that $-\dfrac{\partial F}{\partial V} = P$ when the volume increases by an infinitesimal amount at constant pressure. We'll use this relation later on.

Plugging in equation (1) for $S$ in equation (2), we obtain $$
\begin{align}
F &= -kT \ln \left[ \int e^{\frac{-E}{kT}}dp_1 \, dp_2 \, ... \, dp_{3n} \, dq_1 \, dq_2 \, ... \, dq_{3n} \right] \\[3mm]
&= -kT \ln B \tag{3}
\end{align}
$$ The key insight that makes the nearly-impossible calculation of the integral $B$ unnecessary is that $B$ can be shown to have the form $B = J(V^{*})^n \ \ \text{(4)}$, where $J$ is some function that does not depend on $V^*$. I'll go through the details of that derivation at the end of this post so as not to deviate too far from the more interesting line of reasoning.

Combining equations (3) and (4), we recover van 't Hoff's law for the osmotic pressure $\Pi$: $$
\begin{align}
\Pi &= -\dfrac{\partial F}{\partial V^*} \\[3mm]
&= \dfrac{\partial}{\partial V^*} kT \ln [ J(V^{*})^n ] \\[3mm]
&= \dfrac{\partial}{\partial V^*} kT \, [\ln J + n \ln V^{*} ] \\[3mm]
&= kT\dfrac{n}{V^*} \\[3mm]
&\Downarrow \\[3mm]
\Pi V^* &= (Nk)T\dfrac{n}{N} \\[3mm]
\Pi V^* &= zRT
\end{align}
$$ In the last line, we used the fact that $k = \dfrac{R}{N}$ and denote the number of moles of particles by $z$. We have shown that van 't Hoff's law is a consequence of the molecular-kinetic theory. This law can equivalently be expressed as $\Pi = kT \nu \tag{$\spadesuit$}$ where $\nu = \dfrac{n}{V^*}$ is the concentration of particles in the volume $V^*$.

The diffusion coefficient


In deriving a formula for the diffusion coefficient (i.e. the constant in the differential equation governing the diffusion of the particles over time- if you aren't familiar with the diffusion equation, you'll see presently...),  Einstein introduced a fictitious force and used the fact that the free energy is minimized when the system is in dynamic equilibrium. I couldn't fully understand this argument (if you do, please click here and answer my questions!), so instead, I will show a slightly more direct derivation by letting the force due to osmotic pressure be balanced by friction in the fluid.

Suppose again we have $n$ particles in the volume $V^*$, which has a cross-sectional area $A$ perpendicular to the $x$-axis, and that the concentration of the particles $\nu (x)$ may vary with $x$. Consider the volume $\Delta V^*$ between $x$ and $x+ \Delta x$, which contains $N_1$ particles. We have: $$
\frac{\Pi (x) - \Pi (x+ \Delta x)}{\Delta x} = \frac{F(x) - F(x+ \Delta x)}{A \Delta x} = \frac{N_{1} F_{\Pi}}{A \Delta x}
$$ where $\Pi$ is the osmotic pressure. $F(x)$ is the force on the cross-sectional area at $x$ (positive $F$ would be to the right), and $F_{\Pi}$ is the average force on a particle in the region $\Delta V^*$. Taking the limit as $\Delta x \rightarrow 0$ gives: $$
\frac{\partial \Pi}{\partial x} = - \nu F_{\Pi} \tag{5}
$$ Note that both sides are functions of $x$ and time, but I am leaving the $x$'s and $t$'s out to simplify the notation a bit. The $\frac{N_1}{A \Delta x} = \frac{N_1}{\Delta V^*}$ became $\nu$ in the limit since that was the number of particles per unit volume.

The force $F_{\Pi}$ acts on the particles and "competes" with the force of friction in the fluid. The drag force due to friction is proportional to the particle velocity since a faster-moving particle hits more molecules that slow it down; if the particles are spheres, the drag force is given by Stokes' Law: $$
F_d = -6 \pi \mu R_{p} v \tag{6}
$$ where $\mu$ is the viscosity coefficient of the fluid (measurable by simple experiments), $R_p$ is the particle radius, and $v$ is the particle velocity. This lower-case $\pi$ is the usual 3.14. The negative sign reflects the fact that the drag force points in the opposite direction of the velocity. I'm not going to present a derivation of Stokes' Law since it would be very long, so let's take that as a given and plug on.

Before combining all this, we need one more tidbit. Taking  the partial derivative with respect to $x$ on both sides of equation $(\spadesuit)$ from above, we obtain: $$
\frac{\partial \Pi}{\partial x} = kT \frac{\partial \nu}{\partial x}
$$ In dynamic equilibrium, $F_{\Pi} + F_{d} = 0$, so combining (5) and (6) with the above gives: $$
\begin{align}
- \frac{1}{\nu} \frac{\partial \Pi}{\partial x} &= 6 \pi \mu R_{p} v \\[3mm]
\implies -kT \frac{\partial \nu}{\partial x} &= 6 \pi \mu R_{p} \nu v \tag{7}
\end{align}
$$ Now $\nu v$, the density times the velocity, is the flux of particles (at position $x$ and time $t$), i.e. the number of particles passing through a unit area perpendicular to the $x$-axis, per second. Notice that the units are $\dfrac{\text{particles}}{\text{m}^3} \times \dfrac{\text{m}}{\text{s}} = \dfrac{\text{particles}}{\text{m}^2 \ \text{s}}$, as one would expect of such a quantity.

Fick's First Law states that this flux equals $-D \frac{\partial \nu}{\partial x}$, where $D$ is the diffusion coefficient referred to above. The negative sign means that if the particle density is higher on the left, particles tend to diffuse to the right. $D$ has units of $\frac{\text{m}^2}{\text{s}}$ and measures the mean squared displacement of particle diffusion per unit time. This Stack Exchange thread has a detailed explanation of the physical interpretation of the diffusion coefficient, and this Wikipedia article presents a quick and simple derivation of Fick's First Law.

Combining Fick's First Law with equation (7), we see that $$
D = \frac{kT}{6 \pi \mu R_{p}} = \frac{RT}{N} \frac{1}{6 \pi \mu R_{p}} \tag{8}
$$

Root mean squared displacement


Assume the $n$ particles' movements are independent and introduce a time interval $\tau$. This time interval is much shorter than the time intervals of observation but long enough that we can consider the movements of a single particle in consecutive time intervals of length $\tau$ to be independent.

Since the movements of the $n$ particles are independent of each other, we can think of them as $n$ different stochastic processes. In other words, we can consider them $n$ separate observations of results of the same random experiment. In a time interval $\tau$, the $x$-coordinate of the position of any given particle will change by some amount $\Delta$. If a particle moves to the right, $\Delta > 0$, and if it moves to the left, $\Delta < 0$. The values of $\Delta$ follow some probability distribution $\phi (\Delta)$ so that after a time period $\tau$ elapses, the proportion of particles $\frac{dn}{n}$ which have experienced a displacement in the $x$ direction between $\Delta$ and $\Delta + d \Delta$ satisfies the equation $$
\frac{dn}{n} = \phi(\Delta) \, d \Delta
$$ There are a few conditions that the probability distribution $\phi$ should satisfy. First, all probabilities should add up to 1, so $$
\int \limits_{-\infty}^{\infty}{\phi(\Delta) \, d \Delta} = 1
$$ Also, since $\tau$ is small, $\phi( \Delta)$ should be zero except for very small absolute values of $\Delta$. Finally, a particle should be equally likely to have been displaced to the left or right, so $\phi$ should be an even function, i.e. $$
\phi(\Delta) = \phi( - \Delta)
$$ If we denote the particle density by $\nu = f(x,t)$, then by the definition of $\phi$, we have: $$
f(x,t + \tau) = \int \limits_{-\infty}^{\infty}{f(x+ \Delta , t) \phi(\Delta) \, d \Delta} \tag{9}
$$ This equation states that the number of particles at position $x$ at time $t + \tau$ is the sum (integral) of the numbers of particles that were at positions $x + \Delta$ at time $t$ and were displaced by $-\Delta$ in the time $\tau$ (recall that $\phi(-\Delta) = \phi(\Delta)$), summed over the possible values of $\Delta$. Since $\phi(\Delta)$ is assumed to be non-zero only for very small values of $\Delta$, extending the integral out from $-\infty$ to $\infty$ doesn't change the value, but it will be useful below since it will allow us to use well known results for Gaussian integrals.

Now we can expand $f$ into a Taylor series on both sides of equation (9). On the left, we have $$
\begin{align}
f(x, t + \tau) &= f(x,t) + \tau \frac{\partial f}{\partial t} + \frac{1}{2} \tau^{2} \frac{\partial^{2} f}{\partial t^2} + ... \\[3mm]
&\approx f(x,t) + \tau \frac{\partial f}{\partial t}
\end{align}
$$ On the right side, we need to expand out to second order (you'll see why in a second)- note that it is justified to do the Taylor expansion inside the integral since we are expanding around $\Delta = 0$, and only small values of $\Delta$ (i.e. those for which the Taylor expansion is accurate) contribute anything to the integral: $$
\begin{align}
f(x,t) + \tau \frac{\partial f}{\partial t} &\approx
\int \limits_{-\infty}^{\infty}{\left[ f(x,t) + \Delta \frac{\partial f}{\partial x} + \frac{1}{2} \Delta^{2} \frac{\partial^{2} f}{\partial x^2} \right] \phi(\Delta) \, d \Delta} \\[3mm]

&= f(x,t) \int \limits_{-\infty}^{\infty}{\phi(\Delta) \, d \Delta}
+ \frac{\partial f}{\partial x} \int \limits_{-\infty}^{\infty}{\Delta \phi(\Delta) \, d \Delta}
+ \frac{1}{2} \frac{\partial^{2} f}{\partial x^2} \int \limits_{-\infty}^{\infty}{\Delta^{2} \phi(\Delta) \, d \Delta}
\end{align}
$$ The first integral on the right-hand side is equal to 1, and the second is zero since $\phi(\Delta)$ is an even function (so $\Delta \phi(\Delta)$ is an odd function). So the above simplifies to $$
\frac{\partial f}{\partial t} = \frac{1}{2 \tau} \frac{\partial^{2} f}{\partial x^2} \int \limits_{-\infty}^{\infty}{\Delta^{2} \phi(\Delta) \, d \Delta}
$$ For any choice of the distribution $\phi$, the quantity $\frac{1}{2 \tau} \int_{-\infty}^{\infty}{\Delta^{2} \phi(\Delta) \, d \Delta}$ will be a constant, which we call $D$. It follows that the particle density $f$ satisfies the well known diffusion equation (or Fick's Second Law): $$
\frac{\partial f}{\partial t} = D \frac{\partial^{2} f}{\partial x^2} \tag{10}
$$ This $D$ is the same one referred to in Fick's First Law from above (in fact, the same Wikipedia article shows a derivation of the diffusion equation from Fick's First Law). This means that the formula derived above for the diffusion coefficient is valid here, and equation (10) completely describes the evolution of the particle density.

If we consider a separate coordinate system for each particle whose origin is at the particle's position at $t=0$, then equation (10) still holds, except now $f(x,t)$, instead of describing number of particles at position $x$ at time $t$, would describe the number of particles experiencing a displacement of $x$ from their respective initial positions in an elapsed time $t$. Given the conditions $f(x,0)=0$ for $x \neq 0$ and $\int \limits_{-\infty}^{\infty}{f(x,t) \, dx} = n$, the differential equation (10) is that of diffusion from a point, and its solution is: $$
f(x,t) = \frac{n}{\sqrt{4 \pi D t}} \exp \left(\frac{-x^2}{4Dt} \right)
$$ Since $f(x,t)$ represents a number of particles, $\frac{1}{n} f(x,t)$ is the probability that a given particle experiences a displacement $x$ in time $t$. Indeed, using the Gaussian integral formula $\int_{-\infty}^{\infty}{e^{-ax^2}dx} = \sqrt{\frac{\pi}{a}}$ with $a=4Dt$, we see that $\int_{-\infty}^{\infty}{\frac{1}{n}f(x,t) \, dx}=1$. It follows that the mean value of the squared displacement is: $$
\begin{align}
\left< x^2 \right> &= \int \limits_{-\infty}^{\infty}{x^{2} \frac{1}{n} f(x,t) \, dx} \\
&= \frac{1}{\sqrt{4 \pi Dt}}\int \limits_{-\infty}^{\infty}{x^{2} \exp \left( \frac{-x^2}{4Dt} \right) \, dx} \\[3mm]
&= 2Dt
\end{align}
$$ where the last step utilized the Gaussian integral formula $\int_{-\infty}^{\infty}{x^{2} e^{-ax^2} \, dx} = \frac{1}{2} \sqrt{\frac{\pi}{a^3}}$. Finally, it we see that the root mean sqaured displacement, a measure of the average distance a particle is expected to travel in time $t$, is $\sqrt{\left<  x^2 \right>} = \sqrt{2Dt}$, which is proportional to the square root of $t$.

This result was verified by experiment not long after Einstein published his paper, and this is often considered the defining result that convinced the remaining skeptics at the time that atoms existed.

I hope you enjoyed reading and please post any questions in the comments section.

For those interested, I promised an appendix for one of the details that was glossed over above:

Appendix: derivation of $B = J (V^*)^n$


Consider the $n$ particles at positions $(x_1 , y_1, z_1 ), ... , (x_n , y_n, z_n )$ and each surrounded by an infinitesimal parallelpiped region $dx_i \, dy_i \, dz_i$ which is contained in $V^*$. Consider the same $n$ particles, but now at different positions $(x_i ', y_i ', z_i ')$ and surrounded by parallelpiped regions of the same size as before (thus no label change necessary on those). Writing $$
dB  = J \, dx_1 \, dy_1 \, ... \, dz_n
$$ and $$
dB'  = J' \, dx_1 \, dy_1 \, ... \, dz_n
$$ we see that $\dfrac{dB}{dB'} = \dfrac{J}{J'}$.

The probability that the system is in the first configuration is $dB$ divided by the integral over all configurations, $B$, and likewise, the probability the system is in the second configuration is $\frac{dB'}{B}$. Since the particles move independently and the parallelpiped regions are the same size, the two probabilities must be equal, and so $$
\frac{dB}{B} = \frac{dB'}{B} \implies dB = dB' \implies J = J'
$$ This shows that $J$ is independent of the particle positions and of the volume $V^*$, and so $$
\begin{align}
B = \int dB &= \int \limits_{V^*}{J \, dx_1 \, dy_1 \, ... \, dz_n} \\[3mm]
&= J \int \limits_{V^*}{dx_1 \, dy_1 \, ... \, dz_n} \\[3mm]
&= J (V^*)^n \ \tag*{$\square$}
\end{align}
$$

Einstein's Formula for Entropy

Entropy is often considered a measure of "disorder" which, as you may recall from chemistry, is supposed to increase over time. A physical system tends to evolve in such a way that its useful energy dissipates. The value of entropy measures how far along the system is in that process. There are a few different formulations which each capture this idea; each is a formula that must be calculated as opposed to a physically observable property of the system (like temperature or pressure would be).

In this post, I'm going to show you Einstein's derivation of a formula for entropy, which will also shed some light on what exactly this quantity represents. This formula will be an ingredient in the forthcoming post on Einstein's Brownian motion paper, so pay attention!

Imagine a system, consisting of $n$ atoms of an ideal gas in a closed container ($n$ will be a big number, like on the order of $10^{23}$ big). Actually, it doesn't need to be a gas, but that just seems to be the easiest to picture. An ideal gas means that the molecules are monatomic, and we can ignore rotational energy of the atoms, as well as interactions between them. In order words, we are concerned only with their translational motion and assume all collisions are elastic. Finally, we assume the container of gas is surrounded by an ambient reservoir, with which it can exchange heat, so that the system's temperature $T$ remains pretty much constant.



Each atom has a position and a momentum, each a 3-dimensional vector, i.e. a vector with 3 components. At any given time, the $2 \times 3n = 6n$ components of position and momentum, denoted $q_1, q_2, ..., q_{3n}$ and $p_1, p_2, ..., p_{3n}$ respectively, for all the atoms determine the configuration of the system, and the set of all possible configurations is called configuration space, a subset of $\Bbb R ^ {6n}$. The $q_i$'s and $p_i$'s are called the state variables of the system.

The First Law of Thermodynamics states that energy can be converted into different forms but not created or destroyed, and thus as the system evolves, its change in energy is the work done on the system plus the heat supplied to the system: $$
dE = dW + dQ
$$ If the system is described by a set of parameters $\lambda_1, \lambda_2, ... , \lambda_m$ and the state variables mentioned above, then if it undergoes a small change over a time interval $dt$, the resulting change in energy is given by: $$
dE = \sum_{i=1}^{m}{\frac{\partial E}{\partial \lambda_i}\frac{d\lambda_i}{dt}dt}
+ \sum_{j=1}^{3n}{\left( \frac{\partial E}{\partial q_j}\frac{dq_j}{dt}dt + \frac{\partial E}{\partial p_j}\frac{dp_j}{dt}dt \right)} \tag{$\spadesuit$}
$$ To be a bit more concrete, the "parameters" above would be things like volume of the container, so the first sum is identified with the work done on the system (usually just $dW = P \ dV$), and the second sum is identified with the heat supplied to the system; more heat in the system (all else equal) means higher temperature, i.e. the atoms bounce around faster, and so the second sum is our $dQ$. In the example we're talking about, the energy is only due to translational kinetic energy of the atoms, and thus the energy would be $E=\sum_{j=1}^{3n}{\frac{p_i^2}{2m}}$. There would also be a term involving the $q_i$'s if we took into account gravitational potential energy, which depends on the particles' positions.

The above formula doesn't depend on which path the system takes through state space, so in particular, it holds for an adiabatic change, i.e. one in which the system does not gain/lose any heat so $dQ=0$. Before such a change occurs, the probability of finding the system in a state with energy $E$ is given by the volume of a little box in state space times its probability density: $$
d{\Bbb P} = Ce^{\frac{-E}{kT}}dp_1 \, dp_2 \, ... \, dp_{3n} \, dq_1 \, dq_2 \, ... \, dq_{3n}
$$ The probability density $Ce^{\frac{-E}{kT}}$ deserves a bit of attention. $T$ is the temperature of the system, considered to be constant as mentioned above, $k$ is the Boltzmann constant, $1.38 \times 10^{-23}$ Joules/Kelvin (units of energy per temperature), which makes the exponent dimensionless, and $C$ is a constant which makes all the probabilities add up to 1. We'll solve for $C$ in a moment, but why is the probability density given by an exponential?

If we have two pockets of gas in the container with energies $\epsilon_1$ and $\epsilon_2$, since the model is that they are independent, the probability of finding pocket 1 at energy $\epsilon_1$ and finding pocket 2 at energy $\epsilon_2$ should be the product of the individual probabilities of finding the respective pockets at those energy levels. Furthermore, the energies add, so that the total energy of the two pockets combined is $\epsilon_1 + \epsilon_2$. The exponential function has the desired property: $$
\exp \left( \frac{-(\epsilon_1 + \epsilon_2)}{kT} \right) = \exp \left( \frac{-\epsilon_1}{kT} \right) \exp \left( \frac{-\epsilon_2}{kT} \right)
$$ The fact that the "drop-off factor" in the denominator of the exponent is proportional to $T$ is a consequence of the fact that the average kinetic energy an atom in the gas is $\frac{3}{2}kT$, which in turn follows from the ideal gas law $PV = NkT$. For a simple and insightful derivation of the $e^{\frac{-E}{kT}}$ based on Maxwell's analysis, click here.

Back to the main line: in order to solve for $C$, we note that the probabilities of all possible configurations must equal 1. Probabilities are always non-negative, so we can assume that $C$ is of the form $e^c$, and thus: $$
\begin{align}
1 &= \int{d{\Bbb P}} \\[3mm]
&= \int e^c e^{\frac{-E}{kT}}dp_1 \, dp_2 \, ... \, dp_{3n} \, dq_1 \, dq_2 \, ... \, dq_{3n} \\[3mm]
&= e^c \int e^{\frac{-E}{kT}}dp_1 \, dp_2 \, ... \, dp_{3n} \, dq_1 \, dq_2 \, ... \, dq_{3n} \\[3mm]
& \Downarrow \\[3mm]
c &= - \ln \left[ \int e^{\frac{-E}{kT}}dp_1 \, dp_2 \, ... \, dp_{3n} \, dq_1 \, dq_2 \, ... \, dq_{3n} \right]
\end{align}
$$ The integrals above are taken over the entire range of possible values of the $q_i$'s and $p_i$'s, i.e. over all state space.

After our adiabatic change in the system, there will be a similar expression for $d{\Bbb P}$, except that $c$ now may have shifted a bit from $c$ to $c + dc$, as may have $\beta$ to $\beta + d\beta$ (where we are now defining $\beta := \frac{1}{2kT}$ for notational convenience). The energy $E$ will also shift to $E + dE = E + \sum_{i=1}^{m}{\frac{\partial E}{\partial \lambda_i}\frac{d\lambda_i}{dt}dt}$. Here, we've used equation $(\spadesuit)$ and the fact that $dQ$ = 0, so only the $dW$ term comes into play. To save on the symbols, I'll also start referring to the $\frac{d\lambda_i}{dt}dt$'s simply as $d \lambda$.

Now similar to the above, we have: $$
\begin{align}
1 = &\int{d{\Bbb P}} \\[3mm]

=  &\int \exp \left( (c+dc)-2(\beta+d\beta)\left(E + \sum{\frac{\partial E}{\partial \lambda} d\lambda}\right) \right) dp_1 \, dp_2 \, ... \, dp_{3n} \, dq_1 \, dq_2 \, ... \, dq_{3n} \\[3mm]

 = &\int \exp
\left(
dc - 2
\left(
E \, d\beta + \beta \sum{\frac{\partial E}{\partial \lambda} d\lambda} +d\beta \sum{\frac{\partial E}{\partial \lambda} d\lambda}
\right)
\right) \\[1mm]
&\times \exp \left( c-\frac{E}{kT} \right)
dp_1 \, dp_2 \, ... \, dp_{3n} \, dq_1 \, dq_2 \, ... \, dq_{3n}
\end{align}

$$ We can expand the first exponential into a Taylor series and then neglect the terms past first order: $$
\begin{align}
1 = \int
&\left[
1 + dc - 2
\left(
E \, d\beta + \beta \sum{\frac{\partial E}{\partial \lambda} d\lambda} +d\beta \sum{\frac{\partial E}{\partial \lambda} d\lambda}
\right)
+ \frac{1}{2} (dc - 2 (...))^2 + ... \right] \\[1mm]

& \times \exp \left( c-\frac{E}{kT} \right)
dp_1 \, dp_2 \, ... \, dp_{3n} \, dq_1 \, dq_2 \, ... \, dq_{3n}\\[3mm]

\approx \int
&\left[
1 + dc - 2
\left(
E \, d\beta + \beta \sum{\frac{\partial E}{\partial \lambda} d\lambda} +d\beta \sum{\frac{\partial E}{\partial \lambda} d\lambda}
\right) \right] \\[1mm]

& \times \exp \left( c-\frac{E}{kT} \right)
dp_1 \, dp_2 \, ... \, dp_{3n} \, dq_1 \, dq_2 \, ... \, dq_{3n}\\[3mm]

= \int &\exp \left( c-\frac{E}{kT} \right)
dp_1 \, dp_2 \, ... \, dp_{3n} \, dq_1 \, dq_2 \, ... \, dq_{3n}\\[1mm]

+ \int & \left[
dc - 2
\left(
E \, d\beta + \beta \sum{\frac{\partial E}{\partial \lambda} d\lambda} +d\beta \sum{\frac{\partial E}{\partial \lambda} d\lambda}
\right) \right] \\[1mm]

& \times \exp \left( c-\frac{E}{kT} \right)
dp_1 \, dp_2 \, ... \, dp_{3n} \, dq_1 \, dq_2 \, ... \, dq_{3n}\\[3mm]


= \ \ \ & 1 \\[1mm]

+ \int & \left[
dc - 2
\left(
E \, d\beta + \beta \sum{\frac{\partial E}{\partial \lambda} d\lambda} +d\beta \sum{\frac{\partial E}{\partial \lambda} d\lambda}
\right) \right] \\[1mm]

& \times \exp \left( c-\frac{E}{kT} \right)
dp_1 \, dp_2 \, ... \, dp_{3n} \, dq_1 \, dq_2 \, ... \, dq_{3n}\\[3mm]


\Downarrow \\[3mm]

0  \ \approx \int & \left[
dc - 2
\left(
E \, d\beta + \beta \sum{\frac{\partial E}{\partial \lambda} d\lambda}
\right) \right]  \exp \left( c-\frac{E}{kT} \right)
dp_1 \, dp_2 \, ... \, dp_{3n} \, dq_1 \, dq_2 \, ... \, dq_{3n}\\

\end{align}

$$ Note: in the line before the $\Downarrow$, the first integrand is the probability density, so its integral over all state space must equal 1. Also, after the $\Downarrow$, we dropped the last term in the square brackets because we ignored terms above first order, i.e. terms containing products of two or more differentials.

Since $\exp \left( c-\frac{E}{kT} \right)$ is never negative, the only way the integral in the last line above can equal 0 is if the expression in square brackets equals 0. Thus:$$
dc -2E \, d\beta - 2 \beta \sum{\frac{\partial E}{\partial \lambda} d\lambda} = 0 \tag{1}
$$ On the other hand, multiplying the equation $dE = \sum{\frac{\partial E}{\partial \lambda}d\lambda} + dQ$ by $2 \beta$ and rearranging gives: $$
-2 \beta \, dE + 2 \beta \sum{\frac{\partial E}{\partial \lambda} d \lambda} + 2 \beta \, dQ = 0 \tag{2}
$$ Adding $(1)$ and $(2)$ eliminates the $\lambda$'s and yields: $$
\begin{align}
0 &= dc - 2E \, d\beta - 2\beta \, dE + 2\beta \, dQ \\[2mm]
&= dc -2 (E \, d\beta + \beta \, dE) + 2 \beta \, dQ \\[2mm]
&= dc -2 \, d(\beta E) + 2 \beta \, dQ \\[2mm]
\implies 2 \beta \, dQ &= d(2 \beta E - c) \\[3mm]
\implies \frac{dQ}{T} &= d \left( \frac{E}{T}-kc \right) := dS
\end{align}
$$ In the last step, we plugged in $\beta = \frac{1}{2kT}$ and then multiplied through by the constant $k$.

We have shown that $\frac{dQ}{T}$ is the total differential of some quantity related to energy and temperature, which we call entropy and denote by $S$. Evidently, $S$ is given by: $$
S = \frac{E}{T} + k \ln \left[ \int e^{\frac{-E}{kT}}dp_1 \, dp_2 \, ... \, dp_{3n} \, dq_1 \, dq_2 \, ... \, dq_{3n} \right]
$$ where we have used the formula for $c$ which was derived above. This entropy equation will be used in the next post on Brownian motion- stay tuned.

Thanks for reading, and please post any questions in the comments section.

Handicapping a Race to 7

Karl and Richard are playing a series to 7 (no need to win by 2) in a two-player game, and their skill levels are such that Karl has a probability $p$ of winning any individual game in the series.

If we give Karl a handicap of 2 games so that he only needs to win 5 games to win the series, whereas Richard needs to win 7, what is the single-game probability $p$ that gives each player a 50% chance of winning the series?


I've tagged this post with "Billiards" since the question came from the director of my pool league (who I hear is an avid GTM reader). Actually, what he really wants to know is the best way to set up the scoring, tie-breakers, and handicaps in one of the leagues, but in order to answer these questions, I wanted to start by looking at just one series and then extend that to the more general questions. This analysis really has nothing to do with pool though, so it would work for any other game as well.

Binomial Coefficients


To start out, we'll need the binomial coefficients $\binom{n}{k}$, which are read as "$n$ choose $k$." $\binom{n}{k}$ is the number of ways to choose $k$ items from a set of $n$ distinct items, for example the number of $k$-person boards of directors that can be chosen from $n$ candidates. Note that choosing the $k$ candidates who are included in the board is the same as choosing the $n-k$ who are not included, which means that $\binom{n}{k} = \binom{n}{n-k}$. The formula for the binomial coefficients is $$\dbinom{n}{k} = \dfrac{n!}{k! (n-k)!}$$ from which the above-mentioned symmetry is obvious. Note that even though this is a fraction, it always comes out to be an integer.

The binomial coefficients got their name from the fact that they are the coefficients in the expansion of binomials: $$ (x+y)^n = \sum_{k=0}^{n}{\dbinom{n}{k} x^k y^{n-k}}
$$ This is because in the product $(x+y)(x+y)...(x+y)$ (with $n$ factors of $(x+y)$), each factor of $(x+y)$ contributes either an $x$ or a $y$ to a factor in the sum. If you have $k$ $x$'s in a term, the other $n-k$ factors of $(x+y)$ must have contributed a $y$. There are $\binom{n}{k}$ ways to get $k$ $x$'s and $n-k$ $y$'s, hence the formula. If anyone wants more detail on that, just ask in the comments, and I'll give a more detailed explanation.

Most identities about binomial coefficients can be proved either by using the formula with the factorials, or via a combinatorial argument. For example, for integers $n$, $m$, and $k$ with $0 \leq k \leq m \leq n$, we have the following identity, known as the subset of a subset identity: $$\dbinom{n}{m} \dbinom{m}{k} = \dbinom{n}{k} \dbinom{n-k}{m-k}
$$Algebraic proof: $$
\begin{align}
\dbinom{n}{m} \dbinom{m}{k} &= \dfrac{n!}{m! (n-m)!} \cdot \dfrac{m!}{k! (m-k)!} \\[3mm]
&= \dfrac{n!}{k!(n-m)!(m-k)!} \\[3mm]
&= \dfrac{n!}{k! (n-k)!} \cdot \dfrac{(n-k)!}{(n-m)! (m-k)!} \\[3mm]
&= \dbinom{n}{k} \dbinom{n-k}{m-k} \tag*{$\square$}
\end{align}
$$Combinatorial proof:
The left side of the identity is the number of ways to choose a board of directors with $m$ members from $n$ candidates, and then choose $k$ executive members from the $m$. The right side counts the number of ways to choose $k$ executive members from the $n$ candidates and then choose the $m-k$ non-executive board members from the $n-k$ remaining candidates. These count the same thing, so the two sides must be equal. $\tag*{$\square$}$

Winning a series to 7


In order to win a series to 7, without needing to win by 2, Karl needs to win 7 games, with Richard winning anywhere from 0 to 6 games in the series. If Karl wins 7 games, and Richard wins 3 games (for example), there will be a total of 10 games in the series. The 3 games that Richard does win can come anywhere in the 10 games, except for the 10th game- if it did, then Karl would have already won 7 and the series would not have made it to 10 in the first place. So we can choose from the first $10-1=9$ games where Richard's 3 wins go.

The probability that Karl wins a given game is $p$, which means the probability that Richard beats Karl is $1-p$. Combining all this, we can see that the probability that Karl beats Richard in a race to 7, with Richard winning 3 games, is $$ \binom{10-1}{3} p^7 (1-p)^3
$$Since Karl can win the series with Richard winning anywhere from 0 to 6 games, the total probability that Karl wins the series is the sum over the possible outcomes, with the summation index $k$ being the number of wins Richard gets in the series: $$
\begin{align}
{\Bbb P}(\text{Karl wins the series}) &= \sum_{k=0}^{7-1}{\binom{7+k-1}{k} p^7 (1-p)^k} \\[3mm]
&= \sum_{k=0}^{6}{\binom{6+k}{k} p^7 (1-p)^k}
\end{align}
$$ Here's a graph of the probability that Karl wins the series, for different values of $p$:

 

Not surprisingly, if there's a 50% chance that either player wins an individual game, then there's also a 50% chance that either player wins the series.

Now, let's say we give Karl a handicap of 2 games so that to win the series, Karl needs to win 5 games and Richard needs to win 7. More generally, if we call the handicap $H$, where $0 \leq H \leq 6$, then by the same reasoning as we used above, we get the modified formula: $$
{\Bbb P}(\text{Karl wins the series}) = \sum_{k=0}^{6}{\binom{6-H+k}{k} p^{7-H} (1-p)^k}
$$ Now Karl only needs to win $7-H$ games, and so the total number of games in the series for a given value of $k$ wins for Richard, is $7-H+k$, with the $k$ losses once again being placed anywhere but the last game.

Here are the graphs of Karl's probabilities of winning the series given different values of $p$ and $H$ (you can click to expand the picture):


Now, I'd love to be able say we're done here, but the fact is that for some real Karl and Richard, we have no idea what the value of $p$ is unless we are lucky enough to have a history of, say, 100 games between these two players. And even then, they could have improved over time or gotten rusty or whatever so that games they played a few months ago aren't so telling now as to the value of $p$.

We do know that every player in the league is assigned a ranking (which directly determines the handicap against an opponent) which is certainly partly subjective and determined based on observation by a few very experienced players who run, and possibly play in, the league. Instead of trying to guess $p$ and then assigning the rankings, which would be useless in the absence of a large history of games between each set of two players, we can use the handicaps to back out the value of $p$ that makes the match 50-50. For example, if Karl and Richard's rankings are such that Karl gets a handicap of 3, we can see from the graph above that the match will be 50-50 if Karl's probability $p$ of winning an individual game is about 35.5%.

Using Excel's Goal Seek functionality, I've backed out the values of $p$ that make a 7-game series 50-50 for different handicaps:


To test whether a player's handicap is appropriate, one could take all that player's games against opponents of different ranks and see what percentage of individual games he wins and how far off those percentages are from the table above (perhaps using a chi-square test for goodness of fit). If there are not enough games to do this analysis for individual players, then one could start by looking at the percentages for all games and then looking into the ranks furthest away from the table values and seeing if the stats of any particular player(s) are driving the difference. That's a bit of a manual exercise, but it's a start...

GTM Reader Challenge


"Backing out" $p$ basically means finding the inverse of the function $f(p) = \sum_{k=0}^{6}{\binom{6-H+k}{k} p^{7-H} (1-p)^k}$. We know the function has an inverse because if you look at the graphs, they all pass the horizontal line test. To be more rigorous, they are polynomials in $p$ and thus continuous, and $f(0)=0$ and $f(1)=1$, so the intermediate value theorem tells us that $f$ is surjective. Furthermore, $f$ is increasing on the interval $p \in [0,1]$, so it's also one-to-one, and thus has an inverse.

Now, while Excel Goal Seek will certainly work for this, it would be kind of nice to know the inverse function, so I worked for a few hours today trying to figure out how to invert $f$, but couldn't quite figure it out. Maybe one of my more nerdy readers wants to take a crack at it? Otherwise, maybe I'll go post the question on stack exchange...

[Update 7/29/2015: there's been some confusion on the question I'm asking, so just to clarify, for the purposes of finding the inverse of $f(p)$, assume that the $H$ in the formula above is a constant. So technically, there is a different function $f$ for each value of $H$, which I guess you could call $f_{H}(p)$ or something.]

What I was trying (and maybe this isn't the best way to go about it) was to find a not-too-awful formula for the coefficient of $p^n$ in the sum above and then try to use the Lagrange inversion formula, but it gets a bit messy with all the binomial coefficients. I tried to expand the $(1-p)^k$, turning the sum into a double sum, then switch the order of summation (making sure to adjust the summation limits- the Iverson bracket is helpful at this step), and finally simplify somehow using identities of the binomial coefficients such as the subset of a subset identity above, but said simplification proved elusive, so I didn't even bother with the inversion formula.

Anyone have any thoughts on that or maybe a different way to find the inverse of $f(p)$? Let me know in the comments or email me, and I can provide more details of the computation I tried.

Thanks for reading, and I will try to do a follow-up on this post soon. As always, feel free to ask questions in the comments section.

Recursions Part 2: Generating Functions

Prerequisites: Power Series, Recursions Part 1

Ok folks, it's been a long and relaxing vacation for me, which is why you haven't seen any new posts in the past 3 weeks. I had intended to do this one Live aus Berlin, but it turned out there was lots of better stuff to do there (including the literally 5 hours I spent at the absolutely fantastic Deutsches Historisches Museum; and I only left because they were closing. I started at Charlemagne at 1 PM and was only half-way through the WW2 section when they closed at 6- I still had the entire GDR and cold war to go...), so this one is coming to you Live aus London instead. Hope that's ok...who am I kidding? I'm pretty sure only like 5 people actually read this anyway, so of course it's ok! Also, those of you who know me won't be surprised to learn that I am writing this from the pub, so just a fair warning that the quality may well deteriorate towards the end...

In the last post, I did a brief overview of how we can use a power series to represent a function, and we wanted the series to converge (pointwise) to our function within a neighborhood of $x=0$, or some other central point. In the post before that, I showed you how to solve a simple recursion. In that post, the recursion we came up with was of degree 1, which means that the $n$-th term depended only on the $(n-1)$-th term, and so we could move them both to the same side of the equals sign to get an expression for the $n$-th difference $R(n)-R(n-1)$. We then summed up the differences to get a formula for $R(n)$ in terms of $n$.

That method was pretty fresh, but if the recursive definition of $R(n)$ includes terms before just $R(n-1)$, for example $R(n-2)$, then this method won't work anymore. What I'm going to show you in this post is one of the most clever things I've ever seen, and it is a very powerful method for dealing with higher-degree recursions (I will specify what this means soon if you haven't guessed it already). Basically, what we are going to do is find a function whose power series $\sum_{i=0}^{\infty}{c_n z^n}$ has coefficients $c_n$ which are the $n$-th terms of the recursion we want to solve. Once we back out the coefficients using Taylor's theorem, we will have solved the recursion. Whoever thought of this was wicked creative (and that's not a subtle self-call, because it definitely wasn't me).

What's interesting is that with generating functions, we just need the power series, but we really won't care when it converges, because we are just using it to back out the coefficients. I'll go into more detail on this point below, but it's a big departure from how we looked at power series before, where we pretty much only cared about if, when, and how (i.e. pointwise, uniformly, etc.) the series converged to our function.

Without further ado, let's get crackin'.

Types of Recursions


I started a bit on this above, but a recursion of degree $d$ is one in which the $n$-th term $R(n)$ depends on only some or all of the terms $R(n-1)$ through $R(n-d)$. So $R(n) = R(n-1) + 4nR(n-3)$ is a recursion of degree 3, for example. Note that for a recursion of degree $d$, we need to be given the values of the first $d$ terms in order to solve it.

A recursion equation is linear if the formula for $R(n)$ is a linear combination of the previous terms, i.e. (for some degree $d$) $$
R(n) = a_{n-1} R(n-1) + a_{n-2} R(n-2) + ... + a_{n-d} R(n-d) + \alpha (n)
$$ where the $a_i$'s can either be numbers or depend on $n$, but can't contain any earlier terms, and $\alpha (n)$ is some fixed function of $n$ called the particularity function. If the latter is just 0, then the recursion (or recurrence, not sure if I'm using the word 100% correctly, but whatever, it's the same thing in my mind) is called homogeneous. If the recursion formula has, for example, products of earlier terms it in, then it isn't linear.

Here are some examples of recursions:

Tower of Hanoi
This one arises when we analyze that game with the 3 rods on which we have discs of varying diameter- you know, this one:
The object of the game is to move all the discs from rod 1 to rod 3 by moving only one at a time from the top of a stack, and with the proviso that we may never place a larger one on top of a smaller one. If there are $n$ discs, then the number of moves $h_n$ it takes to accomplish this is given by the recursion formula $$
h_n = 2h_{n-1} + 1 \\
h_1 = 1
$$See if you can derive this formula on your own by assuming the number of moves for $n-1$ discs is given and then solving the game for $n$ discs. This is a linear, non-homogeneous recursion of degree 1.

Fibonacci Numbers
The Fibonacci numbers are ubiquitous, arising from the problem of reproducing pairs of rabbits, the golden ratio (where you have rectangles consisting of two squares- a curve through their corners is called a Fibonacci spiral), number of binary strings of length $n$ without consecutive 1's, etc. The recursion formula is $$
f_n = f_{n-1} + f_{n-2} \\
f_1 = 1, f_2 = 1
$$ which is linear (with constant coefficients, i.e. they don't depend on $n$ as the coefficients are both 1) of degree 2 and homogeneous.

Catalan Numbers
These are ubiquitous in the field of combinatorics, arising from all sorts of counting problems. The recursion formula is $$
C_n = C_0 C_{n-1} + C_1 C_{n-2} + ... + C_{n-1} C_0 \\
C_0 = 1
$$ which is non-linear, not of finite degree (because the number of previous terms in the formula depends on $n$), and homogeneous.

Generating Functions


Now that we've defined what types of "harder" recursions we can encounter, let's move on to the main topic, which is solving them using generating functions.

A generating function for a recurrence $R(n)$ is a formal power series $\sum_{i=0}^{\infty}{c_n z^n}$ where the $n$-th coefficient $c_n$ is equal to the $n$-th recurrence value $R(n)$ for every value of $n$. Technically, I have defined an ordinary generating function (OGF); there are other types such as exponential and Poisson generating functions which are easier to apply in different situations, but they all have the same idea of the $n$-th coefficient's coinciding with the $n$-th thing that we are looking for. I'm only going to talk about OGF's in this post.

In the definition above, I mentioned a formal power series. What this is is a power series where we don't care at all about whether it converges or, if so, for which values of $z$. Basically, we are just going to use the $z$'s as a tool to help keep track of our coefficients $c_n$ without any regard for the convergence properties of the series.

To add and subtract two formal power series, we just add/subtract the coefficients of the terms with the same power of $z$, and to multiply them, we use the distributive property- basically, all the same way that we would do it for a regular power series. The same applies to taking derivatives and integrals of the series.

To show how to use OGF's to solve a recursion, let's take as an example the following linear, homogeneous recursion of degree 2: $$
g_n = 5g_{n-1} - 6g_{n-2} \\
g_0 = 1, g_1 = 2
$$ What we need to do is find a function $G(z)$ which has a power series whose $n$-th coefficient is equal to $g_n$.

Here's how it's done. We'll multiply both sides of the recursion equation by $z_n$, sum them from $n = 2$ to $n = \infty$ (starting at 2 because the the recursion formula starts at $n=2$, with $g_0$ and $g_1$ given), and then get an algebraic expression in terms of $G(z)$. Heeeeeeeeeere we go:$$
\begin{align}
g_n &= 5g_{n-1} - 6g_{n-2} \\[3mm]

g_n z^n &= 5g_{n-1}z^n - 6g_{n-2}z^n \\[3mm]

\sum_{n=2}^{\infty}{g_n z^n} &=
\sum_{n=2}^{\infty}{5g_{n-1}z^n} - \sum_{n=2}^{\infty}{6g_{n-2}z^n} \\[3mm]

\sum_{n=2}^{\infty}{g_n z^n} &=
5z \left( \sum_{n=2}^{\infty}{g_{n-1}z^{n-1}} \right) -
6z^2 \left( \sum_{n=2}^{\infty}{g_{n-2}z^{n-2}} \right) \\[3mm]

\sum_{n=0}^{\infty}{g_n z^n} - g_1 z - g_0 &=
5z \left( \sum_{n=1}^{\infty}{g_{n-1}z^{n-1}} - g_0 \right) -
6z^2 \left( \sum_{n=2}^{\infty}{g_{n-2}z^{n-2}} \right) \\[3mm]

G(z) - g_1 z - g_0 &=
5z \left( G(z) - g_0 \right) -
6z^2 \left( G(z) \right) \\[3mm]

G(z) (1 - 5z + 6z^2) &= g_1z + g_0 - 5g_0 z \\[3mm]

G(z) &= \dfrac{(g_1 - 5g_0)z + g_0}{1- 5z + 6z^2}\\[3mm]

G(z) &= \dfrac{(1 - 3z)}{1- 5z + 6z^2} \\[3mm]

G(z) &= \dfrac{(1 - 3z)}{(1 - 3z)(1 - 2z)} \\[3mm]

G(z) &= \dfrac{1}{1 - 2z} \\[3mm]

G(z) &= \sum_{n=0}^{\infty}{2^n z^n}

\end{align}
$$ Note that in the last step, we used the known power series $$\dfrac{1}{1-x} = \sum_{n=0}^{\infty}{x^n}$$ and substituted in $2z$ for $x$.

Now, as discussed in the Power Series post, this series only converges when $|x| < 1$ or $|z| < \frac{1}{2}$. But guess what? WE DON'T CARE! Because this was just a way to get the coefficients, and we got them. Thus we have solved the recursion; the solution is $$
g_n = 2^n
$$ for all values of $n$. You can verify this by plugging the answer back into the recursion equation, and you'll see that both sides of the equals sign balance.

So that is pretty awesome and illustrates the idea behind OGF's. If you don't think that was wicked cool, then I would have to recommend more..."primitive" forms of entertainment for you, such as MTV's hit series Jersey Shore (which, I must admit, I am a big fan of myself...).

Now, the astute reader probably did think that was wicked cool, but is still just a tad bit bugged by the fact that this example was so obviously contrived so that $G(z)$ had a denominator which factored easily to give a really simple answer.

The astute reader is correct. If we look at the exact same recursion, except change one of the initial conditions from $g_0 = 1$ to $g_0 = 0$, we can repeat the exact same process to arrive at $$
G(z) = \dfrac{2z}{(1-3z)(1-2z)}
$$ which does not easily reduce to a known power series.

But not to worry; we can use a method called partial fraction decomposition to rewrite this quotient as $$
\begin{align}
G(z) &= \dfrac{2}{1-3z} + \dfrac{-2}{1-2z}\\[3mm]
G(z) &= \sum_{n=0}^{\infty}{2 \cdot 3^n z^n} + \sum_{n=0}^{\infty}{(-2) 2^n z^n} \\[3mm]
&\Downarrow \\[3mm]
g_n &= 2 \cdot 3^n - 2^{n+1}
\end{align}
$$ Now, I don't really want to go into detail about partial fraction decomposition, because it's not that exciting, and you can look it up really quickly on Google if you want to know more about it. But I did want to point it out as a useful method when you end up with $G(z)$'s for which you can't just factor the denominator and then use the known power series for $\dfrac{1}{1-x}$.

Another thing I want to point out here is that, although we just looked at an example of a homogeneous recursion, the OGF method also works for non-homogeneous recursions such as the Tower of Hanoi one above.

If you work out the same steps for that one, you'll get the generating function $$
H(z) = \dfrac{z}{(1-z)(1-2z)} = \dfrac{1}{1-2z} - \dfrac{1}{1-z}
$$ and thus $h_n = 2^n - 1$, once again by using the known series of $\dfrac{1}{1-x}$.

Just to sum up here, I'll show you the OGF's of the Fibonacci and Catalan numbers mentioned above.

For the Fibonacci's, the OGF is $$
F(z) = \dfrac{z}{1-z-z^2}
$$ You need to use a partial fraction decomposition to extract the coefficients, which turn out to be $$
f_n = \dfrac{1}{\sqrt 5} \cdot (\gamma^n - {\hat{\gamma}}^n)
$$ where $\gamma$ is the golden mean $\dfrac{1+\sqrt 5}{2}$, and $\hat \gamma$ is its conjugate $\dfrac{1-\sqrt 5}{2}$. This formula for $f_n$ is called the Binet formula for the Fibonacci numbers, named after Jacques Binet, who rediscovered it in 1843 after, you guessed it, our prolific friend Euler had published it in 1765.

I also used the Catalan numbers as an example, and you can click here to see the generating function and formula for these, from the very cool math blog of a guy named Mike Spivey. Who knew there was another math blog out there? Well, there you have it, there is! [Update: as I mention in the comments in response to Joe's question, I didn't do any research on the history of the Binet formula above when I first wrote this post, but now that I have looked into it, I found an interesting history of collaboration as well as independent discovery among famous mathematicians in various nations (frequently at war with each other during the period of European history in question) as they encountered these prolific number sequences and developed generating function and other methods in the 18th and 19th centuries. Here is a link to a good history of the Catalan numbers.]

And with that, there you have it- that's the end of this post. I hope you enjoyed it and thought that OGF's are as cool as I think they are. Just another example of how math and creativity are abolutely NOT mutually exclusive (in fact, quite the opposite, though I must admit that I myself have zero creativity and just copy most of this stuff from somewhere else that I've seen it...)!

Thanks for reading, and please post any questions in the comments section.

Power Series

As the title suggests, this post is going to cover power series, and while I haven't listed any prerequisites at the top, this will probably be a bit "hand-wavy" for those who have not seen sequences/series or calculus before. Sorry, but I am not going to write a series of posts on these topics, because it will take a long time and is already covered all over the internet, but for a refresher on that, I would direct you to Paul's Online Math Notes, which is a great website written by a math professor at Lamar University and actually where I originally learned this post-calculus I stuff. He has a whole section on sequences and series as part of an entire online course on calculus I-III. I'll also address any specific questions in the comments section, so feel free to ask there.

So, why am I writing this post at all? As of today, June 12th, 2015, there were 83 page views on the previous post about solving a recursion, making it one of the most popular to date given the short period of time it's been up. In that post, we went through how to solve a simple recursion by shuffling terms around to arrive at a sum in terms of the differences between successive terms. This method, sweet though it is, does not work for more complicated recursions, but there is a super clever method to solve the latter by finding their "generating function," which is actually so clever that it's going to blow your mind. That will involve a sort of "stepchild" of power series, so this post is really going to serve as a refresher and prerequisite for that. The idea of a power series (and other series representations of functions, e.g. Fourier series) is also really cool in its own right, and was in fact a reader request (ok well, technically, a request from a guy who not only doesn't read this, but also said he still won't read it even if I make a post about power series), so even if said "reader" doesn't see it, it's still a solid topic.

The sum of an infinite series


To talk about power series, we need to have a notion of the sum of an infinite series. A series is a sum of an infinite list of numbers (said list, without the sum, is called a sequence). If that sequence is the list of numbers $(a_n)_{n \in {\Bbb N}}$ (and notice I've used a common notation for sequences here- this means the list of terms $a_n$ with the index $n$ encompassing the counting numbers ${\Bbb N} = \{ 0,1,2,3,... \}$), then the series would be written as the infinite sum $$
\sum_{n=0}^{\infty}{a_n}
$$ To be precise, this infinite sum, like all infinite things in the study of real numbers, really refers to a limit of finite sums. In particular, the sum $\sum_{n=0}^{\infty}{a_n}$ is the limit as $N$ goes to infinity, of the sequence of partial sums $S_N = \sum_{n=0}^{N}{a_n}$. Get it? The partial sums are a sequence, and the sum of the infinite series is the limit of the sequence. In symbols, $$
\sum_{n=0}^{\infty}{a_n} \buildrel {\scr def} \over{=}
\lim_{N \rightarrow \infty} S_N = \lim_{N \rightarrow \infty} \sum_{n=0}^{N}{a_n}
$$ Note that the symbol $\buildrel {\scr def} \over{=}$ stands for "equals by definition." Clearly, in order for this sequence of $S_N$'s to converge to a (non-infinite) number, the numbers $S_N$ need to "level off" and stop getting bigger.

The difference between $S_{N+1}$ and $S_N$ is $a_{N+1}$, so in order for the partial sums to level off, we need the terms to get smaller, or technically closer to zero (i.e. $|a_n|$ gets smaller) since we could be talking about negative numbers, not just positive. So in order for the sum of the infinite series to exist, it must be the case that $\lim_{n \rightarrow \infty} |a_n| = 0$. By the way, when the differences $|S_{N+1} - S_N| \rightarrow 0$ as $N \rightarrow \infty$, we say that the sequence $(S_N)_{N \in {\Bbb N}}$ is a Cauchy sequence, named after Augustin-Louis Cauchy, one of the pioneers of analysis, the field of math that made a lot of our intuition about the number line (and functions thereof and calculus) rigorous. Without analysis, we wouldn't be able to have a lot of the modern theory that's extensively used in practice today, like the Black-Scholes formula and all the other stochastic calculus for example. So the guy is a big deal. Now, I'm not going to go into it now, but I did want to point out that there will probably be not just one, but a whole series of posts on the real numbers later, and in those we'll go into more detail about sequences' converging. It turns out that sequences of real numbers converge to a limit if and only if they are Cauchy sequences. It kind of makes sense, because if the terms of the sequence aren't eventually getting closer and closer together, then how can they possibly be converging to a limit? But that intuition requires some proving which I'm not going to get into right now. Anyway, onwards with the whole series thing.

Now, it turns out that just the fact that $\lim_{n \rightarrow \infty} |a_n| = 0$ isn't enough to guarantee that the series converges. The $a_n$'s need to go to zero fast enough. As an example, take the series $1 + \frac{1}{2} + \frac{1}{3} + \frac{1}{4} + ... + \frac{1}{n} + ...$. Here, the terms certainly approach zero in absolute value since $\frac{1}{n}$ gets smaller and smaller as $n \rightarrow \infty$, but they don't go to zero fast enough for the series to have a finite sum. To prove this, note that $$
\begin{align}
&1 + \frac{1}{2} + (\frac{1}{3} + \frac{1}{4}) + (\frac{1}{5} + \frac{1}{6} + \frac{1}{7} + \frac{1}{8}) + (\frac{1}{9} + ...\\[3mm]
> \ &1 + \frac{1}{2} + (\frac{1}{4} + \frac{1}{4}) + (\frac{1}{8} + \frac{1}{8} + \frac{1}{8} + \frac{1}{8}) + (\frac{1}{16} + ...\\[3mm]
= \ &1 + \frac{1}{2} + \frac{1}{2} + \frac{1}{2} + ...
\end{align}$$ which clearly just keeps adding up to bigger and bigger partial sums and thus diverges.

On the other hand, the series $$1 + \frac{1}{2^2} + \frac{1}{3^2} + ... + \frac{1}{n^2} + ...$$ does converge, and its sum is $\dfrac{\pi^2}{6}$. The proof of this is pretty slick, and of course the first one to come up with it was Euler in 1735 (though he didn't come up with a truly rigorous proof until 1741). I mean, seriously, this guy was an absolute animal- they didn't even have light bulbs yet, and he came up with this ridiculous sum. Anyway, the series $$\sum_{n=0}^{\infty}{\dfrac{1}{n^p}}$$ is called a $p$-series and converges if and only if $p>1$. I'm not going to get into the proof, but it's a nice example of the "going to zero fast enough" thing. As it turns out, there are a whole slew of tests to see if a series converges- the integral test, the alternating series test, the comparison test, etc. You might remember them from your calculus II class. They aren't that exciting, but I just wanted to mention all the above, because it is important when it comes to understanding what we're talking about when we speak of a power series and the convergence properties thereof.

Speaking of power series, let's move on to those now, but if there are any questions on the above discussion, the comments section is the place to ask.

Approximating a function using polynomials


A polynomial is a function of the form $p(x) = c_0 + c_1 x + c_2 x^2 + ... + c_n x^n$. The $c_i$ numbers are called coefficients, and the degree of the polynomial is the highest power of $x$ with a non-zero coefficient.

Polynomials are pretty easy to work with (and, in particular, to differentiate and integrate), so they are a decent choice if we want to approximate another function. If you look at the animated diagram below, you'll see the function $f(x) = e^x$ in blue, and in red, approximations of $f(x)$ using polynomials of increasing degree. This is a great animation ("borrowed", as usual, from Wikipedia, thank you very much) which really demonstrates the main idea I want to get at.

If we look around $x=0$, the approximation is pretty good, and gets better as we increase the degree of the polynomial approximation. When $n=0$, it's the dumbest imaginable "approximation" of $f$: a flat line at the level $f(0)$. For $n=1$, we have a line passing through $f(0)$ and having the same slope as $f$ at that point. A little better...The higher degree approximations take into account more and more of the curvature properties of $f$ and thus begin to "hug" the curve better around $x=0$, and actually further out as well.

This line of reasoning raises an interesting question: if we let the degree of the polynomial approximation approach infinity (and we'd have to really think carefully about what that would entail), could we get an "infinite polynomial" (whatever that means) that is exactly equal to our function for all values of $x$, or at least in a neighborhood of $x=0$?

In order to understand these types of objects, we'll need to make use of the section above in order to actually make sense of such an infinite polynomial. Because this thing is a series involving powers of $x$, it's called a (yes, this is what we've been waiting for...wait for it...) power series.

Power series


Ok, I may have been a bit hasty with the boldface font up there, because I still haven't told you exactly what a power series is. To define it more precisely, a power series is a function of the form $$
p(x) = \sum_{n=0}^{\infty}{c_n x^n}
$$ where the coefficients $c_n$ are some numbers which may depend on $n$ (but which do not depend on $x$).

For a given value of $x$, the numbers $c_n x^n$ are just some new numbers, which we could just as well call $a_n$, and now we are looking at the exact same thing as in the first section about infinite series of numbers. The same rules apply about convergence, which is really the key topic here, since the whole point was to come up with a power series which converges, at least for certain values of $x$, to the value of some other function $f(x)$.

It's worth noting that we are talking about a particular type of convergence here which, in the field of analysis, is called pointwise convergence: at each point $x$, we want the sequence (it's a different sequence of numbers for each value of $x$) of partial sums $$
p_{N}(x) = \sum_{n=0}^{N}{c_n x^n}
$$ to converge to the limit $f(x)$. There are other types of convergence such as uniform convergence, $L^2$-convergence, etc., which I'm not going to go into, but it is important to be clear about what is meant when one says "convergence" in the context of functions; indeed, you often hear of a sequence of functions $p_{n}(x)$ converging to a (limit) function $f(x)$. This could be interpreted in the sense of pointwise convergence, which would mean that the sequence of numbers $p_{1}(x), p_{2}(x), ...$ converges to the number $f(x)$ for each value of $x$, or it could mean that over the range $0 \leq x \leq 1$, the the maximum difference between the $p_{n}(x)$'s and the $f(x)$'s approaches 0 in some way as $n \rightarrow \infty$. Anyway, this is all very detailed, but I'm mentioning it so that you understand we are talking about pointwise convergence and not something else. So with that, let's get back to the discussion at hand.

Given a function $f(x)$, we are looking for some coefficients $c_n$ which make the series $p(x) = \sum_{n=0}^{\infty}{c_n x^n}$ coverge (pointwise) to $f(x)$ for every value of $x$, or at least for some values of $x$ close enough to $x=0$.

As an example, you might recall from precalculus or a similar class in high school, that a geometric series is a series whose terms are $a_0 = a, \ a_1 = ar, \ a_2 = ar^2, \ ... \ , \ a_n = ar^n, \ ... $ where $r$ is a fixed ratio and $a$ is some fixed first term. Clearly, if $|r| \geq 1$, then the terms of the series $\sum_{n=0}^{\infty}{ar^n}$ do not get closer to zero as $n$ gets larger, which means the series diverges, i.e. does not converge to a finite limit. If $|r| < 1$, then the series does converge, and the sum is $\dfrac{a}{1-r}$. To prove this, note that $$
\begin{align}
\sum_{n=0}^{N}{ar^n} &= a + ar + ar^2 + ar^3 + ... + ar^N \\[3mm]
&= a (1 + r + r^2 + r^3 + ... + r^N) \\[3mm]
&= a \dfrac{1-r^{N+1}}{1-r} \tag{$\clubsuit$} \\[3mm]
\end{align}
$$ where $(\clubsuit)$ is true because if you multiply $(1-r)$ by $(1 + r + r^2 + ... + r^N)$, you'll see that you get $1-r^{N+1}$. We're also assuming that $|r| < 1$ so that $1-r \neq 0$ and thus it's safe to divide by it in equation $(\clubsuit)$. So anyway, now we have $$
\begin{align}
\sum_{n=0}^{\infty}{ar^n} &= \lim_{N \rightarrow \infty} \sum_{n=0}^{N}{ar^n} \\[3mm]
 & \buildrel (\clubsuit) \over {=} \lim_{N \rightarrow \infty} a \dfrac{1-r^{N+1}}{1-r} \\[3mm]
&= a \lim_{N \rightarrow \infty} \dfrac{1-r^{N+1}}{1-r} \\[3mm]
&= a \dfrac{1}{1-r} \\[3mm]
&= \dfrac{a}{1-r}
\end{align}
$$ since $r^{N+1} \rightarrow 0$ as $N \rightarrow \infty$, and the rest of that fraction is independent of $N$. So that's a neat little formula, but if we change the name of $r$ to $x$ (and there's no reason we can't do that) and take the particular case of $a=1$, then we have shown that for values of $x$ in the range $|x|<1$ (i.e. $-1 < x < 1$), the power series $$ \sum_{n=0}^{\infty}{x^n} $$ converges pointwise to the function $f(x) = \dfrac{1}{1-x}$. Since the power series representation only works for $-1 < x < 1$, we say that the series has a radius of convergence of 1.

There are lots of common functions with known power series representations, such as $e^x$, $\cos x$, $\sin x$, etc. In fact, combining those 3, you can prove the famous identity involving everyone's favorite math symbols, $e^{i \pi} = -1$. What??? Yup, it's true, as weird as it is.

Those readers who are familiar with calculus can go ahead and keep reading here, though the above is all you will need for Recursions Part 2.

Now that we've gone through approximating a function with polynomials of increasing degree, I just want to point out 3 more things.

The first is that if $f(x) = \sum_{n=0}^{\infty}{c_n (x-c)^n}$ has a radius of convergence $R$ (and note here that I've generalized a bit and centered the series around $x = c$ instead of $x=0$), then the derivative of the function is $f'(x) = \sum_{n=0}^{\infty}{n c_n (x-c)^{n-1}}$, and there's a similar formula for $\int f(x) dx$. Furthermore, both of these also have radius of convergence $R$. Note that since we are dealing with infinite series, not finite sums, these two facts are not trivial, and you have to go through a little limit of partial sums argument to prove them.

The second point is that if a function $f(x)$ is $k$ times differentiable at a point $c \in {\Bbb R}$, then there exists a remainder function $R_{k}(x)$ such that the $k$-th order Taylor polynomial $$ \sum_{n=0}^{k}{\dfrac{f^{(n)}(x-c)}{n!}(x-c)^n} $$ is within $R_{k}(x-c)^k$ of $f(x)$, and the remainder goes to 0 as $x \rightarrow c$. Here, $f^{n}(x-c)$ denotes the $n$-th derivative of $f$ evaluated at the point $x-c$. There are a few subtleties, but intuitively, these Taylor polynomials represent increasingly accurate polynomial approximations as in the earlier diagram. If a function is infinitely differentiable at $x=c$, then (ignoring the subtleties once again) you can let $k$ go to $\infty$ to obtain the Taylor series of $f$. For "nice enough" functions, i.e. ones that aren't contrived to be counterexamples, the Taylor series converges to $f$, but there are a few wacky examples where it actually converges to something else. But that's Taylor's Theorem for you, which basically provides us with a formula to find the $c_n$'s we wanted in order to approximate a function using polynomials.

The third and final point before we wrap up is that there are other types of series that approximate functions as well. A notable example is Fourier series, which are very important in the world of digital signals. A Fourier series approximates a function using sine waves of different frequencies. The details of how this works could be an entire post, but just look at this picture of the first four partial sums of the Fourier series for the square wave function, and it will kind of make sense:


Then there's the sort-of-related Fourier transform, which decomposes a signal into its constituent frequencies, and which can be seen as a sort of limit of Fourier series (which are sums) into integrals.

Ok, that's enough of this topic. I hope you enjoyed it if you made it this far, and I did go through it pretty fast, so feel free to ask any questions that are still lingering in your mind in the comments section.