## Einstein's Formula for Entropy

Entropy is often considered a measure of "disorder" which, as you may recall from chemistry, is supposed to increase over time. A physical system tends to evolve in such a way that its useful energy dissipates. The value of entropy measures how far along the system is in that process. There are a few different formulations which each capture this idea; each is a formula that must be calculated as opposed to a physically observable property of the system (like temperature or pressure would be).

In this post, I'm going to show you Einstein's derivation of a formula for entropy, which will also shed some light on what exactly this quantity represents. This formula will be an ingredient in the forthcoming post on Einstein's Brownian motion paper, so pay attention!

Imagine a system, consisting of $n$ atoms of an ideal gas in a closed container ($n$ will be a big number, like on the order of $10^{23}$ big). Actually, it doesn't need to be a gas, but that just seems to be the easiest to picture. An ideal gas means that the molecules are monatomic, and we can ignore rotational energy of the atoms, as well as interactions between them. In order words, we are concerned only with their translational motion and assume all collisions are elastic. Finally, we assume the container of gas is surrounded by an ambient reservoir, with which it can exchange heat, so that the system's temperature $T$ remains pretty much constant.

Each atom has a position and a momentum, each a 3-dimensional vector, i.e. a vector with 3 components. At any given time, the $2 \times 3n = 6n$ components of position and momentum, denoted $q_1, q_2, ..., q_{3n}$ and $p_1, p_2, ..., p_{3n}$ respectively, for all the atoms determine the configuration of the system, and the set of all possible configurations is called configuration space, a subset of $\Bbb R ^ {6n}$. The $q_i$'s and $p_i$'s are called the state variables of the system.

The First Law of Thermodynamics states that energy can be converted into different forms but not created or destroyed, and thus as the system evolves, its change in energy is the work done on the system plus the heat supplied to the system: $$dE = dW + dQ$$ If the system is described by a set of parameters $\lambda_1, \lambda_2, ... , \lambda_m$ and the state variables mentioned above, then if it undergoes a small change over a time interval $dt$, the resulting change in energy is given by: $$dE = \sum_{i=1}^{m}{\frac{\partial E}{\partial \lambda_i}\frac{d\lambda_i}{dt}dt} + \sum_{j=1}^{3n}{\left( \frac{\partial E}{\partial q_j}\frac{dq_j}{dt}dt + \frac{\partial E}{\partial p_j}\frac{dp_j}{dt}dt \right)} \tag{\spadesuit}$$ To be a bit more concrete, the "parameters" above would be things like volume of the container, so the first sum is identified with the work done on the system (usually just $dW = P \ dV$), and the second sum is identified with the heat supplied to the system; more heat in the system (all else equal) means higher temperature, i.e. the atoms bounce around faster, and so the second sum is our $dQ$. In the example we're talking about, the energy is only due to translational kinetic energy of the atoms, and thus the energy would be $E=\sum_{j=1}^{3n}{\frac{p_i^2}{2m}}$. There would also be a term involving the $q_i$'s if we took into account gravitational potential energy, which depends on the particles' positions.

The above formula doesn't depend on which path the system takes through state space, so in particular, it holds for an adiabatic change, i.e. one in which the system does not gain/lose any heat so $dQ=0$. Before such a change occurs, the probability of finding the system in a state with energy $E$ is given by the volume of a little box in state space times its probability density: $$d{\Bbb P} = Ce^{\frac{-E}{kT}}dp_1 \, dp_2 \, ... \, dp_{3n} \, dq_1 \, dq_2 \, ... \, dq_{3n}$$ The probability density $Ce^{\frac{-E}{kT}}$ deserves a bit of attention. $T$ is the temperature of the system, considered to be constant as mentioned above, $k$ is the Boltzmann constant, $1.38 \times 10^{-23}$ Joules/Kelvin (units of energy per temperature), which makes the exponent dimensionless, and $C$ is a constant which makes all the probabilities add up to 1. We'll solve for $C$ in a moment, but why is the probability density given by an exponential?

If we have two pockets of gas in the container with energies $\epsilon_1$ and $\epsilon_2$, since the model is that they are independent, the probability of finding pocket 1 at energy $\epsilon_1$ and finding pocket 2 at energy $\epsilon_2$ should be the product of the individual probabilities of finding the respective pockets at those energy levels. Furthermore, the energies add, so that the total energy of the two pockets combined is $\epsilon_1 + \epsilon_2$. The exponential function has the desired property: $$\exp \left( \frac{-(\epsilon_1 + \epsilon_2)}{kT} \right) = \exp \left( \frac{-\epsilon_1}{kT} \right) \exp \left( \frac{-\epsilon_2}{kT} \right)$$ The fact that the "drop-off factor" in the denominator of the exponent is proportional to $T$ is a consequence of the fact that the average kinetic energy an atom in the gas is $\frac{3}{2}kT$, which in turn follows from the ideal gas law $PV = NkT$. For a simple and insightful derivation of the $e^{\frac{-E}{kT}}$ based on Maxwell's analysis, click here.

Back to the main line: in order to solve for $C$, we note that the probabilities of all possible configurations must equal 1. Probabilities are always non-negative, so we can assume that $C$ is of the form $e^c$, and thus: \begin{align} 1 &= \int{d{\Bbb P}} \\[3mm] &= \int e^c e^{\frac{-E}{kT}}dp_1 \, dp_2 \, ... \, dp_{3n} \, dq_1 \, dq_2 \, ... \, dq_{3n} \\[3mm] &= e^c \int e^{\frac{-E}{kT}}dp_1 \, dp_2 \, ... \, dp_{3n} \, dq_1 \, dq_2 \, ... \, dq_{3n} \\[3mm] & \Downarrow \\[3mm] c &= - \ln \left[ \int e^{\frac{-E}{kT}}dp_1 \, dp_2 \, ... \, dp_{3n} \, dq_1 \, dq_2 \, ... \, dq_{3n} \right] \end{align} The integrals above are taken over the entire range of possible values of the $q_i$'s and $p_i$'s, i.e. over all state space.

After our adiabatic change in the system, there will be a similar expression for $d{\Bbb P}$, except that $c$ now may have shifted a bit from $c$ to $c + dc$, as may have $\beta$ to $\beta + d\beta$ (where we are now defining $\beta := \frac{1}{2kT}$ for notational convenience). The energy $E$ will also shift to $E + dE = E + \sum_{i=1}^{m}{\frac{\partial E}{\partial \lambda_i}\frac{d\lambda_i}{dt}dt}$. Here, we've used equation $(\spadesuit)$ and the fact that $dQ$ = 0, so only the $dW$ term comes into play. To save on the symbols, I'll also start referring to the $\frac{d\lambda_i}{dt}dt$'s simply as $d \lambda$.

Now similar to the above, we have: \begin{align} 1 = &\int{d{\Bbb P}} \\[3mm] = &\int \exp \left( (c+dc)-2(\beta+d\beta)\left(E + \sum{\frac{\partial E}{\partial \lambda} d\lambda}\right) \right) dp_1 \, dp_2 \, ... \, dp_{3n} \, dq_1 \, dq_2 \, ... \, dq_{3n} \\[3mm] = &\int \exp \left( dc - 2 \left( E \, d\beta + \beta \sum{\frac{\partial E}{\partial \lambda} d\lambda} +d\beta \sum{\frac{\partial E}{\partial \lambda} d\lambda} \right) \right) \\[1mm] &\times \exp \left( c-\frac{E}{kT} \right) dp_1 \, dp_2 \, ... \, dp_{3n} \, dq_1 \, dq_2 \, ... \, dq_{3n} \end{align} We can expand the first exponential into a Taylor series and then neglect the terms past first order: \begin{align} 1 = \int &\left[ 1 + dc - 2 \left( E \, d\beta + \beta \sum{\frac{\partial E}{\partial \lambda} d\lambda} +d\beta \sum{\frac{\partial E}{\partial \lambda} d\lambda} \right) + \frac{1}{2} (dc - 2 (...))^2 + ... \right] \\[1mm] & \times \exp \left( c-\frac{E}{kT} \right) dp_1 \, dp_2 \, ... \, dp_{3n} \, dq_1 \, dq_2 \, ... \, dq_{3n}\\[3mm] \approx \int &\left[ 1 + dc - 2 \left( E \, d\beta + \beta \sum{\frac{\partial E}{\partial \lambda} d\lambda} +d\beta \sum{\frac{\partial E}{\partial \lambda} d\lambda} \right) \right] \\[1mm] & \times \exp \left( c-\frac{E}{kT} \right) dp_1 \, dp_2 \, ... \, dp_{3n} \, dq_1 \, dq_2 \, ... \, dq_{3n}\\[3mm] = \int &\exp \left( c-\frac{E}{kT} \right) dp_1 \, dp_2 \, ... \, dp_{3n} \, dq_1 \, dq_2 \, ... \, dq_{3n}\\[1mm] + \int & \left[ dc - 2 \left( E \, d\beta + \beta \sum{\frac{\partial E}{\partial \lambda} d\lambda} +d\beta \sum{\frac{\partial E}{\partial \lambda} d\lambda} \right) \right] \\[1mm] & \times \exp \left( c-\frac{E}{kT} \right) dp_1 \, dp_2 \, ... \, dp_{3n} \, dq_1 \, dq_2 \, ... \, dq_{3n}\\[3mm] = \ \ \ & 1 \\[1mm] + \int & \left[ dc - 2 \left( E \, d\beta + \beta \sum{\frac{\partial E}{\partial \lambda} d\lambda} +d\beta \sum{\frac{\partial E}{\partial \lambda} d\lambda} \right) \right] \\[1mm] & \times \exp \left( c-\frac{E}{kT} \right) dp_1 \, dp_2 \, ... \, dp_{3n} \, dq_1 \, dq_2 \, ... \, dq_{3n}\\[3mm] \Downarrow \\[3mm] 0 \ \approx \int & \left[ dc - 2 \left( E \, d\beta + \beta \sum{\frac{\partial E}{\partial \lambda} d\lambda} \right) \right] \exp \left( c-\frac{E}{kT} \right) dp_1 \, dp_2 \, ... \, dp_{3n} \, dq_1 \, dq_2 \, ... \, dq_{3n}\\ \end{align} Note: in the line before the $\Downarrow$, the first integrand is the probability density, so its integral over all state space must equal 1. Also, after the $\Downarrow$, we dropped the last term in the square brackets because we ignored terms above first order, i.e. terms containing products of two or more differentials.

Since $\exp \left( c-\frac{E}{kT} \right)$ is never negative, the only way the integral in the last line above can equal 0 is if the expression in square brackets equals 0. Thus:$$dc -2E \, d\beta - 2 \beta \sum{\frac{\partial E}{\partial \lambda} d\lambda} = 0 \tag{1}$$ On the other hand, multiplying the equation $dE = \sum{\frac{\partial E}{\partial \lambda}d\lambda} + dQ$ by $2 \beta$ and rearranging gives: $$-2 \beta \, dE + 2 \beta \sum{\frac{\partial E}{\partial \lambda} d \lambda} + 2 \beta \, dQ = 0 \tag{2}$$ Adding $(1)$ and $(2)$ eliminates the $\lambda$'s and yields: \begin{align} 0 &= dc - 2E \, d\beta - 2\beta \, dE + 2\beta \, dQ \\[2mm] &= dc -2 (E \, d\beta + \beta \, dE) + 2 \beta \, dQ \\[2mm] &= dc -2 \, d(\beta E) + 2 \beta \, dQ \\[2mm] \implies 2 \beta \, dQ &= d(2 \beta E - c) \\[3mm] \implies \frac{dQ}{T} &= d \left( \frac{E}{T}-kc \right) := dS \end{align} In the last step, we plugged in $\beta = \frac{1}{2kT}$ and then multiplied through by the constant $k$.

We have shown that $\frac{dQ}{T}$ is the total differential of some quantity related to energy and temperature, which we call entropy and denote by $S$. Evidently, $S$ is given by: $$S = \frac{E}{T} + k \ln \left[ \int e^{\frac{-E}{kT}}dp_1 \, dp_2 \, ... \, dp_{3n} \, dq_1 \, dq_2 \, ... \, dq_{3n} \right]$$ where we have used the formula for $c$ which was derived above. This entropy equation will be used in the next post on Brownian motion- stay tuned.