Powered by Blogger.

Stars and Bars

This post comes in response to the very first reader request:

My request is for a topic on object set problems that come up often on the GRE: permutations and combinations. I know the basics, like combination = order doesn't matter & permutation = order does matter, and the former is reduced in number from the latter. Where it gets dicey for me is when it comes to situations of repetition vs. no repetition.

So, for example, with a combination with repetition (a store sells shirts that come in 6 different colors and I have a coupon to get 3 shirts of whatever color, how many variations of that can I make?) what is the best way to solve for that when you can have two or more colors be the same? I know there are formulas for each scenario that you can use, but I guess I'm curious about the basic proof behind the formulas (especially the combination with repetition).

Indeed, there is a formula for this exact scenario, but as you mentioned, let's use the question from the request above to illustrate the thought processes required to solve this problem in a few different ways, and then derive the general formula as a result.

The question here is how many different sets of 3 shirts we can make given 6 choices of shirt color. If we were picking one shirt for Monday, one for Tuesday, and one for Wednesday (i.e. if order mattered), then there would be $6^3 = 216$ choices, since we can choose from a set of 6, 3 times. But since the order does not matter in this question, this would be double-counting a lot of different combos: if we make 3 sequential selections from the pool of 6 colors, there are 3 which result in, for example, 2 blue shirts and 1 red shirt (R-B-B, B-R-B, and B-B-R). And there would be $6=3!$ (that's 3 factorial) different ways to end up with the set B,Y,G: B-Y-G, B-G-Y, G-B-Y, G-Y-B, Y-G-B, and Y-B-G.

As we can see from the two examples above, starting with $6^3$ and then subtracting out the repetitions is a bit tricky because the number of repetitions actually depends on which colors we choose (more precisely, on how many different colors end up in our set of 3). If each shirt is the same color, there are no repetitions to count; if all 3 shirts are different colors, there are 6 repetitions. And if we choose 1 of one color and 2 of another, there are 3 repetitions. We need to consider these cases separately in order to correctly count all the possibilities.

Case 1: 3 shirts of 3 different colors
In this case, we just need to choose 3 colors from the 6 possible ones, and then our set is completely determined, so there are $6 \choose 3$ possibilities, where $6 \choose 3$ is the binomial coefficient, calculated as $\dfrac{6!}{3! (6-3)!}$.

Case 2: 2 of one color and 1 of another
In order to choose such a set of 3, we need to first choose two colors from the 6. There are $6 \choose 2$ ways to do that. Next, we need to pick which of the two colors will be the color of 2 shirts and which will be the color of 1. There are 2 options for this, so the number of ways to create a set falling under this case is $2 \times {6 \choose 2}$.

Case 3: 3 shirts of the same color
This is the easiest case- there are clearly 6 ways to choose 3 shirts of all the same color, since all we need to do is choose the color. Note that we can also write this as $6 \choose 1$.

Finally, the answer to this question is $$
\text{# 3-shirt sets given 6 color choices} &=  \sum_{\text{cases}}{\text{# per case}} \\[3mm]
&= {6 \choose 3} + \left[ 2 \times {6 \choose 2} \right] + {6 \choose 1} \\[3mm]
&= 20 + [2 \times 15] + 6 \\[3mm]
&= 56
$$ We can actually back out the $6^3$ from the case where order matters based on the above by multiplying by the number of repetitions in each case: $6^3 = 216 =  (6 \times 20) + (3 \times 2 \times 15) + 6$.

The general formula: "stars and bars" (or "shirts and bars"...)

In the example above, in the case where we choose 2 colors, say red and blue, for 3 shirts, we need to choose how many of the 3 will be red and how many will be blue. Obviously, there are 2 choices for this: 1-2 and 2-1.

If we had needed to pick, for example, 5 shirts, it would have been a bit more complicated: we could have chosen between 1 and 5 different colors. If we look at the case where we choose 3 different colors (say blue, red, and fuschia) for 5 shirts, we would need to choose how many shirts would be each of the 3 colors; there are 6 ways to do this: 1-1-3, 1-3-1, 3-1-1, 1-2-2, 2-1-2, and 2-2-1. There are also numerous other combinations to count, so this case is a lot harder to count directly than the example above.

Luckily, there is an easy tool called stars and bars that helps us visualize and count in these tougher examples. I'm going to use shirts and bars instead since we're talking about shirts, but it's the same thing.

If we have 5 shirts and 3 colors, we can see in the pictures below that there are 5 shirts, and thus $5-1=4$ slots in which to insert $3-1=2$ bars to divide the shirts into 3 groups to be colored in. The second/third pictures corresponds to option 1-2-2 from above.

This clearly illustrates that if there are 6 colors, and we need to choose 5 shirts, then in the case where we have 3 different colors, the number of possible combinations is: $$
&\text{ways to choose 3 colors from 6} \times \text{ways to color 5 shirts with 3 colors} \\[3mm]
&={6 \choose 3} \times {{5-1} \choose {3-1}} \\[3mm]
&= 20 \times 6 \\[3mm]
&= 120
$$ Using this logic, we can see that the general formula for the number of ways to choose $k$ shirts from a pool of $n$ colors is the sum of the numbers for each case: $$
\sum_{j=1}^{k}{{n \choose j}{{k-1} \choose {j-1}}} \tag{1}
$$ where $j$ is the number of different colors, ${n \choose j}$ is the number of ways to choose those colors, and ${{k-1} \choose {j-1}}$ is the number of ways to color the $k$ shirts using the $j$ colors, which we got via stars and bars above. Try this formula for case where $n=6$ and $k=3$, and you will see we recover the answer 56 from our first example.

Note that the sum in (1) only goes up to $k$, since we can't select more colors than the number of shirts (here, I'm assuming $n>k$ as in our example). However, if $j>k$, then the number of ways to choose $j$ items from a set of $k$ is 0, so ${{k-1} \choose {j-1}}=0$. This means we can extend the limit of summation up to $n$ without changing the value of the sum. We can also extend the lower limit of summation from 1 to 0, since the $j-1$ factor will zero out the second term when $j=0$ (i.e. there are zero ways to choose a negative number of items from a pool of $k-1$). Note that the same logic holds if $n \leq k$, since then the $n \choose j$ factor becomes zero when $j>n$. Thus, regardless of whether $n>k$ or $n \leq k$, we have: $$
\sum_{j=1}^{k}{{n \choose j}{{k-1} \choose {j-1}}}
\sum_{j=0}^{n}{{n \choose j}{{k-1} \choose {j-1}}}
$$ but furthermore: $$
\sum_{j=0}^{n}{{n \choose j}{{k-1} \choose {j-1}}}
&= \sum_{j=0}^{n}{{n \choose j}{{k-1} \choose {(k-1)-(j-1)}}} \\[3mm]
&= \sum_{j=0}^{n}{{n \choose j}{{k-1} \choose {k-j}}} \\[3mm]
&= {{n+k-1} \choose {k}}
$$ where the first equality is due to the fact that for any positive integers $m,r$ with $r<m$, we have ${m \choose r} = {m \choose {m-r}}$. This just means that selecting $r$ items from a pool of $m$ is equivalent to choosing $m-r$ items not to include in your selection. The third equality is called the Vandermonde Convolution, and if you read the pool handicapping post, you might have guessed that it has both an algebraic and a combinatorial proof. I'll leave them to you to figure out (see the end of this post for a hint on each).

The fact that the sum from before boils down to a single binomial coefficient suggests that there is a simpler solution to our counting problem. Indeed there is, and it's actually another stars and bars method.

Going back to the 6 colors, 3 shirts example, let's draw 3 shirts and place $6-1=5$ bars anywhere to the left or right of any shirt. We are allowed (and in this example, forced) to place two bars next to each other with no shirt in between. Suppose we then color the shirts in a fixed order (e.g. blue, red, green, yellow, fuschia, orange): every shirt to the left of the first bar (if any shirts) will be blue, shirts between the first and second bar would be red, between the second and third green, and so forth. Shirts to the right of the fifth bar would be orange. So the case of 2 blues and 1 red would look like this:
Note that there are $6+3-1$ objects ($6-1$ of them are bars, and 3 are shirts), and choosing a set of 3 shirts from the 6 color options is equivalent to simply choosing where in the line-up of 8 objects the 3 shirts should go. Thus, the number of ways to do this is ${{6+3-1} \choose {3}} = {8 \choose 3} = 56$, the same answer as before.

To recap, you now know two different stars and bars methods which you can apply in combinatorial counting problems. I hope this was informative and thoroughly addressed the request. If any questions remain, feel free to ask in the comments section.

Lastly, here are the hints for the two proofs of the Vandermonde Convolution:

Algebraic proof: Note that ${{n+k-1} \choose {k}}$ is the coefficient of $x^k$ in the expansion of $(1+x)^{n+k-1}$. What is the coefficient of $x^k$ in the expansion of $(1+x)^{n} (1+x)^{k-1}$?

Combinatorial Proof: Suppose we have a bag of $n+k-1$ (numbered) marbles, of which $n$ are blue and $k-1$ are red. How many different sets of $k$ marbles can we create from the bag? How many ways can we choose $k$ marbles from the bag, where $j$ are blue and $k-j$ are red?

Einstein on Brownian Motion

In 1827, botanist Robert Brown observed through his microscope that pollen particles in resting water exhibited motion which he was unable to explain. This motion came to be known as Brownian motion, which now also has a precise mathematical definition as a stochastic (i.e. random) process.

Fast forward to 1905: the scientific community had not yet accepted, as we do today, that matter consists of atoms. Though analysis based on atoms and molecules had proven useful in some applications, many physicists regarded atoms as a hypothetical tool rather than a physical reality. Alternative theories postulated that the structure of the world consisted, for example, of different forms of energy and energy transformations. Albert Einstein's 1905 paper "On the movement of particles suspended in stationary liquids required by the molecular-kinetic theory of heat" (kind of a mouthful, but hey, the guy was German, so what did you expect?) explained Brownian motion as arising from the random bombardments of the pollen particles by moving water molecules, like in the animation below:

Einstein was able to derive formulas for the mean square displacement of a suspended particle as well as Avogadro's number, and when Jean Perrin experimentally verified these results in 1908, the remaining skeptics accepted this as evidence of the physical reality of atoms.

In this post, I will walk you through Einstein's pivotal analysis of Brownian motion.

Note: nothing in this post is new- it's all copied from various sources, including Einstein's paper, Robert E. Kennedy's book on the same (as well as other Einstein papers), and other sources from around the internet. I found it very difficult to find any single source that provided a satisfactory explanation of every step in Einstein's paper, so I hope this post, which has taken me quite a while, achieves that and is clear enough to follow easily without the need to scour the internet for additional information. If not, please post any questions in the comments section!

The setup

We have a container with volume $V$ filled with liquid (the solvent- let's just call it water going forward) and a solute (e.g. sugar) dissolved therein. The container is divided into two sections by a wall through which the water can flow freely, but which is impermeable to the sugar, and the latter is confined to the walled-off section of the container, which has volume $V^*$. The volume of the remainder of the container is then $V-V^*$.

Since the sugar molecules cannot pass through the wall, they bounce back when they hit it and thus exert a pressure on the wall (the water passes right through the wall and thus does not exert a pressure on it). Van 't Hoff's law states that if the number of moles of sugar in the volume $V^*$ is $z$, then this pressure $\Pi$, known as the osmotic pressure, is given by $\Pi V^* = zRT$, where $R$ is the gas constant and $T$ is the temperature in Kelvin. This is the same as the ideal gas law, which is not totally unexpected since the pressure is due to the sugar molecules bouncing around in the volume $V^*$ the same way a gas in a closed container would.

If, instead of sugar, some larger particles are suspended (but not dissolved, the difference lying solely in the particle size) in the water, according to Einstein, the osmotic pressure on the wall should be given by the same formula as that for solutions above; in other words, the osmotic pressure only depends on the number (or more precisely, the concentration, or number per unit volume) of dissolved/suspended particles. The next section shows his derivation based on the "molecular-kinetic theory of heat."

The osmotic pressure due to suspended particles

In the last post, I showed you Einstein's derivation of a formula for the entropy of $n$ particles in the water bouncing around due to random molecular motion and collisions: $$
S = \frac{\bar{E}}{T} + k \ln \left[ \int e^{\frac{-E}{kT}}dp_1 \, dp_2 \, ... \, dp_{3n} \, dq_1 \, dq_2 \, ... \, dq_{3n} \right] \tag{1}
$$ Here, $\bar E$ is the total energy of the $n$ particles, $T$ is the absolute temperature in Kelvin, $k$ is the Boltzmann constant (equal to the gas constant $R$ divided by Avogadro's number $N$), and the $p_i$'s and $q_i$'s are the state variables of the system of $n$ particles, representing the 3 components of momentum and position of each particle, hence the indices' running up to $3n$. The integral is taken over the different configurations of the system (i.e. different possible values of position and momentum of the particles). The $E$ inside the integral is the energy of the system as a function of the state variables.

The free energy (Helmholtz free energy) of the system is defined as $F = E - TS \ \ \text{(2)}$. If you read the Wikipedia article on Helmholtz free energy, you can see the derivation of the formula $dF = -S \, dT - P \, dV$ via basic calculus (in particular, the product rule for differentiation). From this formula, it's clear that $-\dfrac{\partial F}{\partial V} = P$ when the volume increases by an infinitesimal amount at constant pressure. We'll use this relation later on.

Plugging in equation (1) for $S$ in equation (2), we obtain $$
F &= -kT \ln \left[ \int e^{\frac{-E}{kT}}dp_1 \, dp_2 \, ... \, dp_{3n} \, dq_1 \, dq_2 \, ... \, dq_{3n} \right] \\[3mm]
&= -kT \ln B \tag{3}
$$ The key insight that makes the nearly-impossible calculation of the integral $B$ unnecessary is that $B$ can be shown to have the form $B = J(V^{*})^n \ \ \text{(4)}$, where $J$ is some function that does not depend on $V^*$. I'll go through the details of that derivation at the end of this post so as not to deviate too far from the more interesting line of reasoning.

Combining equations (3) and (4), we recover van 't Hoff's law for the osmotic pressure $\Pi$: $$
\Pi &= -\dfrac{\partial F}{\partial V^*} \\[3mm]
&= \dfrac{\partial}{\partial V^*} kT \ln [ J(V^{*})^n ] \\[3mm]
&= \dfrac{\partial}{\partial V^*} kT \, [\ln J + n \ln V^{*} ] \\[3mm]
&= kT\dfrac{n}{V^*} \\[3mm]
&\Downarrow \\[3mm]
\Pi V^* &= (Nk)T\dfrac{n}{N} \\[3mm]
\Pi V^* &= zRT
$$ In the last line, we used the fact that $k = \dfrac{R}{N}$ and denote the number of moles of particles by $z$. We have shown that van 't Hoff's law is a consequence of the molecular-kinetic theory. This law can equivalently be expressed as $\Pi = kT \nu \tag{$\spadesuit$}$ where $\nu = \dfrac{n}{V^*}$ is the concentration of particles in the volume $V^*$.

The diffusion coefficient

In deriving a formula for the diffusion coefficient (i.e. the constant in the differential equation governing the diffusion of the particles over time- if you aren't familiar with the diffusion equation, you'll see presently...),  Einstein introduced a fictitious force and used the fact that the free energy is minimized when the system is in dynamic equilibrium. I couldn't fully understand this argument (if you do, please click here and answer my questions!), so instead, I will show a slightly more direct derivation by letting the force due to osmotic pressure be balanced by friction in the fluid.

Suppose again we have $n$ particles in the volume $V^*$, which has a cross-sectional area $A$ perpendicular to the $x$-axis, and that the concentration of the particles $\nu (x)$ may vary with $x$. Consider the volume $\Delta V^*$ between $x$ and $x+ \Delta x$, which contains $N_1$ particles. We have: $$
\frac{\Pi (x) - \Pi (x+ \Delta x)}{\Delta x} = \frac{F(x) - F(x+ \Delta x)}{A \Delta x} = \frac{N_{1} F_{\Pi}}{A \Delta x}
$$ where $\Pi$ is the osmotic pressure. $F(x)$ is the force on the cross-sectional area at $x$ (positive $F$ would be to the right), and $F_{\Pi}$ is the average force on a particle in the region $\Delta V^*$. Taking the limit as $\Delta x \rightarrow 0$ gives: $$
\frac{\partial \Pi}{\partial x} = - \nu F_{\Pi} \tag{5}
$$ Note that both sides are functions of $x$ and time, but I am leaving the $x$'s and $t$'s out to simplify the notation a bit. The $\frac{N_1}{A \Delta x} = \frac{N_1}{\Delta V^*}$ became $\nu$ in the limit since that was the number of particles per unit volume.

The force $F_{\Pi}$ acts on the particles and "competes" with the force of friction in the fluid. The drag force due to friction is proportional to the particle velocity since a faster-moving particle hits more molecules that slow it down; if the particles are spheres, the drag force is given by Stokes' Law: $$
F_d = -6 \pi \mu R_{p} v \tag{6}
$$ where $\mu$ is the viscosity coefficient of the fluid (measurable by simple experiments), $R_p$ is the particle radius, and $v$ is the particle velocity. This lower-case $\pi$ is the usual 3.14. The negative sign reflects the fact that the drag force points in the opposite direction of the velocity. I'm not going to present a derivation of Stokes' Law since it would be very long, so let's take that as a given and plug on.

Before combining all this, we need one more tidbit. Taking  the partial derivative with respect to $x$ on both sides of equation $(\spadesuit)$ from above, we obtain: $$
\frac{\partial \Pi}{\partial x} = kT \frac{\partial \nu}{\partial x}
$$ In dynamic equilibrium, $F_{\Pi} + F_{d} = 0$, so combining (5) and (6) with the above gives: $$
- \frac{1}{\nu} \frac{\partial \Pi}{\partial x} &= 6 \pi \mu R_{p} v \\[3mm]
\implies -kT \frac{\partial \nu}{\partial x} &= 6 \pi \mu R_{p} \nu v \tag{7}
$$ Now $\nu v$, the density times the velocity, is the flux of particles (at position $x$ and time $t$), i.e. the number of particles passing through a unit area perpendicular to the $x$-axis, per second. Notice that the units are $\dfrac{\text{particles}}{\text{m}^3} \times \dfrac{\text{m}}{\text{s}} = \dfrac{\text{particles}}{\text{m}^2 \ \text{s}}$, as one would expect of such a quantity.

Fick's First Law states that this flux equals $-D \frac{\partial \nu}{\partial x}$, where $D$ is the diffusion coefficient referred to above. The negative sign means that if the particle density is higher on the left, particles tend to diffuse to the right. $D$ has units of $\frac{\text{m}^2}{\text{s}}$ and measures the mean squared displacement of particle diffusion per unit time. This Stack Exchange thread has a detailed explanation of the physical interpretation of the diffusion coefficient, and this Wikipedia article presents a quick and simple derivation of Fick's First Law.

Combining Fick's First Law with equation (7), we see that $$
D = \frac{kT}{6 \pi \mu R_{p}} = \frac{RT}{N} \frac{1}{6 \pi \mu R_{p}} \tag{8}

Root mean squared displacement

Assume the $n$ particles' movements are independent and introduce a time interval $\tau$. This time interval is much shorter than the time intervals of observation but long enough that we can consider the movements of a single particle in consecutive time intervals of length $\tau$ to be independent.

Since the movements of the $n$ particles are independent of each other, we can think of them as $n$ different stochastic processes. In other words, we can consider them $n$ separate observations of results of the same random experiment. In a time interval $\tau$, the $x$-coordinate of the position of any given particle will change by some amount $\Delta$. If a particle moves to the right, $\Delta > 0$, and if it moves to the left, $\Delta < 0$. The values of $\Delta$ follow some probability distribution $\phi (\Delta)$ so that after a time period $\tau$ elapses, the proportion of particles $\frac{dn}{n}$ which have experienced a displacement in the $x$ direction between $\Delta$ and $\Delta + d \Delta$ satisfies the equation $$
\frac{dn}{n} = \phi(\Delta) \, d \Delta
$$ There are a few conditions that the probability distribution $\phi$ should satisfy. First, all probabilities should add up to 1, so $$
\int \limits_{-\infty}^{\infty}{\phi(\Delta) \, d \Delta} = 1
$$ Also, since $\tau$ is small, $\phi( \Delta)$ should be zero except for very small absolute values of $\Delta$. Finally, a particle should be equally likely to have been displaced to the left or right, so $\phi$ should be an even function, i.e. $$
\phi(\Delta) = \phi( - \Delta)
$$ If we denote the particle density by $\nu = f(x,t)$, then by the definition of $\phi$, we have: $$
f(x,t + \tau) = \int \limits_{-\infty}^{\infty}{f(x+ \Delta , t) \phi(\Delta) \, d \Delta} \tag{9}
$$ This equation states that the number of particles at position $x$ at time $t + \tau$ is the sum (integral) of the numbers of particles that were at positions $x + \Delta$ at time $t$ and were displaced by $-\Delta$ in the time $\tau$ (recall that $\phi(-\Delta) = \phi(\Delta)$), summed over the possible values of $\Delta$. Since $\phi(\Delta)$ is assumed to be non-zero only for very small values of $\Delta$, extending the integral out from $-\infty$ to $\infty$ doesn't change the value, but it will be useful below since it will allow us to use well known results for Gaussian integrals.

Now we can expand $f$ into a Taylor series on both sides of equation (9). On the left, we have $$
f(x, t + \tau) &= f(x,t) + \tau \frac{\partial f}{\partial t} + \frac{1}{2} \tau^{2} \frac{\partial^{2} f}{\partial t^2} + ... \\[3mm]
&\approx f(x,t) + \tau \frac{\partial f}{\partial t}
$$ On the right side, we need to expand out to second order (you'll see why in a second)- note that it is justified to do the Taylor expansion inside the integral since we are expanding around $\Delta = 0$, and only small values of $\Delta$ (i.e. those for which the Taylor expansion is accurate) contribute anything to the integral: $$
f(x,t) + \tau \frac{\partial f}{\partial t} &\approx
\int \limits_{-\infty}^{\infty}{\left[ f(x,t) + \Delta \frac{\partial f}{\partial x} + \frac{1}{2} \Delta^{2} \frac{\partial^{2} f}{\partial x^2} \right] \phi(\Delta) \, d \Delta} \\[3mm]

&= f(x,t) \int \limits_{-\infty}^{\infty}{\phi(\Delta) \, d \Delta}
+ \frac{\partial f}{\partial x} \int \limits_{-\infty}^{\infty}{\Delta \phi(\Delta) \, d \Delta}
+ \frac{1}{2} \frac{\partial^{2} f}{\partial x^2} \int \limits_{-\infty}^{\infty}{\Delta^{2} \phi(\Delta) \, d \Delta}
$$ The first integral on the right-hand side is equal to 1, and the second is zero since $\phi(\Delta)$ is an even function (so $\Delta \phi(\Delta)$ is an odd function). So the above simplifies to $$
\frac{\partial f}{\partial t} = \frac{1}{2 \tau} \frac{\partial^{2} f}{\partial x^2} \int \limits_{-\infty}^{\infty}{\Delta^{2} \phi(\Delta) \, d \Delta}
$$ For any choice of the distribution $\phi$, the quantity $\frac{1}{2 \tau} \int_{-\infty}^{\infty}{\Delta^{2} \phi(\Delta) \, d \Delta}$ will be a constant, which we call $D$. It follows that the particle density $f$ satisfies the well known diffusion equation (or Fick's Second Law): $$
\frac{\partial f}{\partial t} = D \frac{\partial^{2} f}{\partial x^2} \tag{10}
$$ This $D$ is the same one referred to in Fick's First Law from above (in fact, the same Wikipedia article shows a derivation of the diffusion equation from Fick's First Law). This means that the formula derived above for the diffusion coefficient is valid here, and equation (10) completely describes the evolution of the particle density.

If we consider a separate coordinate system for each particle whose origin is at the particle's position at $t=0$, then equation (10) still holds, except now $f(x,t)$, instead of describing number of particles at position $x$ at time $t$, would describe the number of particles experiencing a displacement of $x$ from their respective initial positions in an elapsed time $t$. Given the conditions $f(x,0)=0$ for $x \neq 0$ and $\int \limits_{-\infty}^{\infty}{f(x,t) \, dx} = n$, the differential equation (10) is that of diffusion from a point, and its solution is: $$
f(x,t) = \frac{n}{\sqrt{4 \pi D t}} \exp \left(\frac{-x^2}{4Dt} \right)
$$ Since $f(x,t)$ represents a number of particles, $\frac{1}{n} f(x,t)$ is the probability that a given particle experiences a displacement $x$ in time $t$. Indeed, using the Gaussian integral formula $\int_{-\infty}^{\infty}{e^{-ax^2}dx} = \sqrt{\frac{\pi}{a}}$ with $a=4Dt$, we see that $\int_{-\infty}^{\infty}{\frac{1}{n}f(x,t) \, dx}=1$. It follows that the mean value of the squared displacement is: $$
\left< x^2 \right> &= \int \limits_{-\infty}^{\infty}{x^{2} \frac{1}{n} f(x,t) \, dx} \\
&= \frac{1}{\sqrt{4 \pi Dt}}\int \limits_{-\infty}^{\infty}{x^{2} \exp \left( \frac{-x^2}{4Dt} \right) \, dx} \\[3mm]
&= 2Dt
$$ where the last step utilized the Gaussian integral formula $\int_{-\infty}^{\infty}{x^{2} e^{-ax^2} \, dx} = \frac{1}{2} \sqrt{\frac{\pi}{a^3}}$. Finally, it we see that the root mean sqaured displacement, a measure of the average distance a particle is expected to travel in time $t$, is $\sqrt{\left<  x^2 \right>} = \sqrt{2Dt}$, which is proportional to the square root of $t$.

This result was verified by experiment not long after Einstein published his paper, and this is often considered the defining result that convinced the remaining skeptics at the time that atoms existed.

I hope you enjoyed reading and please post any questions in the comments section.

For those interested, I promised an appendix for one of the details that was glossed over above:

Appendix: derivation of $B = J (V^*)^n$

Consider the $n$ particles at positions $(x_1 , y_1, z_1 ), ... , (x_n , y_n, z_n )$ and each surrounded by an infinitesimal parallelpiped region $dx_i \, dy_i \, dz_i$ which is contained in $V^*$. Consider the same $n$ particles, but now at different positions $(x_i ', y_i ', z_i ')$ and surrounded by parallelpiped regions of the same size as before (thus no label change necessary on those). Writing $$
dB  = J \, dx_1 \, dy_1 \, ... \, dz_n
$$ and $$
dB'  = J' \, dx_1 \, dy_1 \, ... \, dz_n
$$ we see that $\dfrac{dB}{dB'} = \dfrac{J}{J'}$.

The probability that the system is in the first configuration is $dB$ divided by the integral over all configurations, $B$, and likewise, the probability the system is in the second configuration is $\frac{dB'}{B}$. Since the particles move independently and the parallelpiped regions are the same size, the two probabilities must be equal, and so $$
\frac{dB}{B} = \frac{dB'}{B} \implies dB = dB' \implies J = J'
$$ This shows that $J$ is independent of the particle positions and of the volume $V^*$, and so $$
B = \int dB &= \int \limits_{V^*}{J \, dx_1 \, dy_1 \, ... \, dz_n} \\[3mm]
&= J \int \limits_{V^*}{dx_1 \, dy_1 \, ... \, dz_n} \\[3mm]
&= J (V^*)^n \ \tag*{$\square$}