Statistical Mechanics

$$\newcommand{\pd}[2]{\frac{\partial #1}{\partial #2}}$$ $$\newcommand{\RR}{\mathbb{R}}$$ $$\newcommand{\NN}{\mathbb{N}}$$ $$\newcommand{\QQ}{\mathbb{Q}}$$ $$\newcommand{\cP}{\mathcal{P}}$$

In [205]:
import sympy as sp
import numpy as np
import matplotlib.pyplot as plt
from sympy import *

Structure of these notes

Disclaimer: Not error free! I recommend the main sources of these notes, which are: (a little bit more theoretical and physics oriented) (a little bit more practical and engineering oriented)

The structure of these notes is:

  • overview of foundational ideas (from a pretty probabilistic perspective)
  • some examples of the statistical mechanics of different systems

Some concepts which I'm presupposing knowledge of:

  • probability: distributions, information entropy
  • mechanics: Hamiltonian, phase space, basic quantum mechanics

Some general points:

  • I generally assume $k_B$ is $1$, to make it go away.

General approach

First, a high-level description of the pattern of reasoning involved in equilibrium statistical mechanics:

  • Suppose you have some system that you understand physically, in the sense of knowing its Hamiltonian and state space. You don't necessarily know the exact state of the system, that is, you have uncertainty. You can represent this uncertainty by saying you have a distribution $\rho$ over states

  • But you do have some information. First, you know, or rather assume, that the system is ergodic, so that the time average of the system is a stationary distribution $\rho$ such that $\pd{\rho}{t}=0$ (not the same as $\frac{d\rho}{dt}=0$), and such that the long-term average of the system from any starting state is $\rho$ (note the similarity to the stationary distribution obtained from Markov Chain Monte Carlo)

  • Further, you may have additional information. For example, if you know the energy at any time, and the system is isolated, then you know the energy at any other time (since energy is conserved in a closed system). Or, you might not know the exact energy, but you might know the expected energy $\langle E \rangle$ (or be able to measure it to arbitrary precision).

  • You then assume that $\rho$ is the maximum entropy distribution which satisfies your constraints. This distribution is called the (equilibrium) ensemble. Note: there are other ways to think about statistical mechanics, but this one interacts particularly nicely with Bayesian probability theory, and the principle of maximum entropy.

  • Once you have the ensemble for your system, you can calculate expected energy, as well as other thermodynamic quantities, like entropy. In fact, you can theoretically derive all of thermodynamics, which is both a validation of statistical mechanics and a motivation for it.

Microcanonical Ensemble (N, V, E)

Suppose we have a system which is closed, so its energy is constant in time (by time invariance of the Hamiltonian), and that we know this energy $E$.

Then we take as our ensemble a uniform distribution over all states with that energy. This is the microcanonical ensemble.

Its normalization constant is called $\Omega$. Its entropy is then $\log\Omega$ (typically $k_B\log\Omega$, but the Boltzmann constant is a correction of unit choices people made before equating the units of energy and temperature).

$$%for $\Omega\propto\tilde{\Omega}$, differing by some quantum corrections (e.g. divide by $N!h^{3N}$ for an ideal gas).$$

Canonical Ensemble (N, V, T)

By contrast, if we want to keep temperature fixed, but allow for the possibility of energy changing (this corresponds to a system in contact with a heat bath with which it can exchange energy), then the expected energy is a macroscopic quantity we may observe, since we assume that in equilibrium the heat exchanged is on average none (and we're not worrying about work either - we assume the systems are in contact with fixed diathermal walls).

Maximizing entropy over all distributions with fixed expected energy, we obtain the canonical ensemble. We can do this by solving a constrained optimization problem, where our distribution is represented as a vector with two constraints: it is of unit length (i.e. a probability distribution) and its expectation gives a fixed energy $E$. Depending on whether our ensemble is continuous or discrete, we can use either standard multivariate calculus or the calculus of variations (actually the continuous case is more subtle because entropy is only defined for discrete distributions - the continuous analog behaves differently). Lagrange multipliers work to impose constraints, for either approach. For the simplest case, say that $p$ is a discrete probability vector, (a unit vector in $\\R^n$ where $n$ is the number of outcomes), and we want to find $p$ which maximizes entropy. The Lagrange multiplier problem is:

$$ K = -\sum_i^np_i\log p_i + \alpha(\sum_i^np_i-1)+ \beta(\sum_i^np_iH_i-E) $$

The second constraint is that the expected energy under $p$ is $E$ (that is, $E[H]=E$, noting that $E$ is used in two ways here: with the expectation symbol, and to denote the expected energy). Then:

$$ 0 = \frac{dK}{dp_j} = -\log p_j + \beta H_j + C \Rightarrow p_i = \frac{1}{Z}e^{\beta H_j}$$

where $C$ is a constant (note that there were a bunch of constant terms we added together to make $C$) which we can solve by the constraint that $\sum_ip_i=1$, and we define $Z=\frac{1}{C}$.

A more physical derivation

Call $E_B$ the energy of the heat bath, $E$ the energy of the system, and $\hat{E}$ the total energy. Then, since $N$ and $V$ are fixed, $S_B$ is a function of $E_B$, and $S_B(E_B) = k_B\log\Omega_B$, so $\Omega_B = \exp(S_B(E_B))$. Now on the assumption of a heat bath that is large relative to the system, we can approximate $S_B(E_B)$ with arbitrary accuracy by: $S_B(\hat{E})-\frac{\partial S_B}{\partial E_B}E$ (since $E_B=\hat{E}-E$). Then, consider that the microcanonical ensemble can be applied to the joint system (the system + the heat bath), and that the probability of, marginally, just the $(q_i,p_i)$ of the system must be proportional to $\Omega(\hat{E}-H(q_i,p_i))=\Omega(E_B)\propto e^{-\frac{\partial S_B}{\partial E_B}E}=e^{-\beta E}$ for $\beta = \frac{1}{T}$.

What we obtain from either method is the Boltzmann distribution:

$$ \rho(s) = \frac{1}{Z}e^{-\beta H(s)} $$

Partition function, moments, and exponential family distributions

The normalization constant $Z(\beta)$ (note that it's a function of $\beta$) actually turns out to be useful for deriving a bunch of important quantities. In fact, this usefulness of a normalization constant is a more general property of exponential family distributions, of which the Boltzmann distribution is an example (maximum entropy distributions generally are).

Sidenote: In the language of exponential families distributions (see probability notes), we can recognize $\beta$ as the natural parameter, $s=(p,q)$ as the random variable, $\log Z$ as the log-normalizer, and $-H$ as the sufficient statistic.

What is relevant here is that:

$$\frac{\partial\log Z(\beta)}{\partial\beta}=\langle - H \rangle = -E$$

$$\frac{\partial^2\log Z(\beta)}{\partial\beta^2}= Var(-H)=Var(H)$$

So when a physicist says that the partition function is important, that's the same as when a statistician says that the log normalizer of an exponential family distribution is important.

Fluctuation-dissipation relations

$Var(H)$ in physics is often denoted by $(\Delta E)^2$. Reducing $(\Delta H)^2$ further gives us $(\Delta E)^2 = \frac{\partial}{\partial\beta}(\frac{1}{Z}\frac{\partial Z}{\partial\beta})=\frac{\partial}{\partial\beta}(\frac{1}{Z}E)=k_B\frac{\partial E}{\partial T} T^2=k_BT^2NC_v$

where the specific heat capacity is $c_v=\frac{1}{N}\left(\frac{\partial E}{\partial T}\right)_{N,V}$.

This is a fluctuation-dissipation relationship. The left-hand side describes the variance (fluctuation) of the equilibrium distribution, and the right-hand side describes the system's response (dissipation) to a change of temperature.

(N, p T) ensemble

The ensemble for a system which can exchange energy and volume with a reservoir.

Same as above derivation, but now writing

$$S_B(E_B)=S_B(\hat{E}-E_A,N_B,\hat{V}-V_A) = S_B(\hat{E},N_B,\hat{V})-\left(\pd{ S_B}{E_B}\right)_{N,V}E-\left(\pd{S_B}{V_B}\right)_{N,E}V$$

we obtain (with judicious use of Maxwell's relations): $$\rho(s,V) \propto e^{-\beta (E(s)+pV)}$$

Grand canonical ensemble ($\mu$, V, T)

$$\rho(s,N) \propto e^{-\beta (E(s)-\mu N(s))}$$

Note that the partition function marginalizes over both $N$ (all values from $0$ to $\infty$) and $s$. Even in a continuous setting for the state, the marginalization over $N$ is usually a sum, not an integral.

How ensembles work in a quantum setting

The previous description was pretty agnostic over whether the system involved in the ensemble was classical, or quantum, or other.

In quantum physics, the obect which represents full uncertainty (both uncertainty fundamental to quantum mechanics and other uncertainty) is a density matrix. The canonical ensemble is the following density matrix:

$$\rho = \frac{e^{-\beta H}}{Z}$$


$$Z=Tr(e^{-\beta H}) = \sum_n \langle n|e^{-\beta H}|n\rangle = \sum_ne^{-\beta E_n}$$

and where $|n\rangle$ is an eigenstate of the Hamiltonian and $E_n$ is an energy eigenvalue.

Note that, since for an operator $O$ and an eigenvector $O_n$, $p(O_n)=\langle O_n|\rho | O_n\rangle$, we have for $O=H$:

$$p(E_n) = \langle E_n|\rho | E_n\rangle = \frac{e^{-\beta E_n}}{Z}$$

This accords with the above description of the canonical ensemble.

How ensembles work in a classical setting

In a classical setting, we can characterize our system exactly by a point in phase space. So we expect $\rho$ to be a distribution over the phase space and $Z$ to be an integral over phase space.

As for the microcanonical ensemble, we don't actually fix a single energy, but rather constrain the energy to be in an infinitesimal region. That we should do this is clear from the dimensions: if the energy lay on a surface, we would be integrating over an $n-1$ dimensional manifold in phase space but (since we're going to be dividing by $h^{\frac{n}{2}}$, where $n$ is the dimensionality of the phase space - see below) this is dimensionally wrong.

Intuition: phase flow may be extremely chaotic, in the sense that points near each other in phase space may end up very far from each other very quickly. But the stationary distribution is the same regardless of the starting state, which gives us an escape from the chaos. We can't predict the future of any state, but can predict the future of a distribution over states very well.

Planck correction

Both $\Omega$ and $Z$ are dimensionally inconsistent, since, for example, we're going to need to take the $\log$ of $Z$. It turns out that we correct by division by $h^{\frac{n}{2}}$. This can actually be derived from the quantum physics directly, as in page 33 of these notes. Fundamentally, the factor of $h$ comes from the inner product $\langle p|q\rangle$.

Indistinguishability correction

Phase space also overcounts, in the sense that permuting particles gives an identical state which is counted separately. We divide the partition function by $N!$ to account for this.

The equilibrium ensemble should be a function of the Hamiltonian

(1) For stationarity, we need $\frac{\partial \rho}{\partial t}=0$.

(2) By Louiville's theorem, $\frac{d\rho}{d t}=\frac{\partial \rho}{\partial t} + \{\rho,H\}=0$. (Note that this required us to use the product rule, the definition of the Poisson bracket, and the definition of the Hamiltonian).

(3) If $\rho$ is a function of $H$, then $\{\rho,H\}=0$ (property of the Poisson bracket)

(4) So stationarity can be achieved if $\rho$ is a function of $H$.

Relationship between ensembles

In the thermodynamic limit, the canonical and microcanonical ensemble converge. To see this, write:

$$ Z = \sum_ie^{-\beta E_i} = \sum_{E_i}\Omega(E_i)e^{\beta E_i} \approx \Omega(E^{\\*})e^{\beta E^{\\*}}$$

where $E^{\\*}$ is the maximum value of $E_i$, and the approximation is justified since both terms in the sum scale exponentially with $E_i$ which scales linearly (energy is extensive) with $N$, which is enormous. So we can differentiate the negative of the log of both sides of the above equation by $\beta$, to obtain:

$$\langle E \rangle \approx -\pd{}{\beta}\log \Omega(E^{\\*})e^{\beta E^{\\*}} = -\pd{}{\beta}\log \Omega(E^{\\*}) + \pd{}{\beta}\beta E^{\\*} = E^{\\*}$$

In other words, the energy is effectively a constant, for $N\to\infty$ in the canonical ensemble, which is precisely the constraint of the microcanonical ensemble.

As for the variance, since $c\_v$ is an intensive quantity, we now have that $Var(H)=(\Delta E)^2=k_BT^2NC_v \Rightarrow STD(H)=\Delta E\propto N$, and since $E$ is extensive with $E\propto \sqrt{N}$, it follows that $\frac{\Delta E}{E}\propto \frac{1}{\sqrt{N}}\to 0$ as $N\to \infty$.

Useful mathematics

A number of other variables, known as thermodynamic potentials, turn out to be more useful than energy $E(S,V,N)$.

Helmholtz free energy

The first is the Helmholtz free energy $A(T,V,N) = E(S,V,N) - TS$. Looking at the differentials: $dE = TdS - pdV + \mu dN$, so that $dA = TdS - pdV + \mu dN - TdS - SdT = \mu dN - SdT - pdV$. (Note that physicists use $F$ for Helmholtz free energy).

To understand this, note that if our experimental set up has fixed temperature and number, but variable pressure (e.g. a chamber of gas in thermal contact with a heat bath, with volume controlled by a piston) then $dT=0$ and $dN=0$, so that $dA = -pdV = \bar{dW}$. In other words, the change in free energy is the amount of work you get out or put in to the system.


$$H = E + pV $$

$$dH = TdS - pdV + \mu dN + pdV + Vdp = TdS + \mu dN + Vdp $$

Gibbs free energy (aka Gibbs free enthalpy)

$$G = A + pV $$

$$dG = \mu dN - SdT - pdV + pdV + Vdp = \mu dN - SdT + Vdp $$

Useful for systems with fixed temperature and pressure.

Legendre and Laplace transforms

The transformations which take one thermodynamic potential to another are Legendre transforms, while the transformations which take one ensemble normalization constant to another are Laplace transforms. For instance:

$$ Z(\beta) = \int e^{-\beta H} dpdq = \int \Omega(E) e^{-\beta E} dE = L(\Omega)(\beta)$$

$$ A = E - TS $$

The first equation shows that the normalization constant for a system at fixed temperature, volume and number, i.e. canonical $(T,V,N)$ ensemble, is the Laplace transform of the normalization constant for a system at fixed entropy, volume and number, i.e. the microcanonical $(S,V,N)$ ensemble. The second equation shows that the thermodynamic potential for a system at fixed temperature, volume and number, i.e. the Helmholtz free energy $A$, is the Legendre transform of the thermodynamic potential for a system at fixed entropy, volume and number, i.e. the energy.

Note that the first equation contains the insight that you can obtain the partition function by summing/integrating over energies, so long as you keep track of their degeneracies.

Similar transformations exist between each of the ensembles, and things commute. Which is to say that, starting with one normalization constant, you derive the corresponding potential and then Legendre transform, or Laplace transform and then derive the corresponding potential.

Maxwell relations

These come up constantly in thermodynamic calculations

First Maxwell Relation:

Consider an injective function $z(x,y)$. Then let $a=\pd{z}{x}$ and $b=\pd{z}{y}$. $dz = adx + bdy$, so:

$$\pd{a}{y} = \pd{b}{x}$$

Second Maxwell Relation:

Informal derivation. Viewing $z$ as a function of $x,y$, $dz = \left(\pd{z}{x}\right)_ydx + \left(\pd{z}{y}\right)_xdy$, but viewing $x$ as a function of $y,z$, we have instead: $dx = \left(\pd{x}{y}\right)_zdy + \left(\pd{x}{z}\right)_ydz$. Combining these:

$$ dz = \left(\pd{z}{x}\right)_y(\left(\pd{x}{y}\right)_zdy + \left(\pd{x}{z}\right)_ydz) + \left(\pd{z}{y}\right)_xdy = \left(\pd{z}{x}\right)_y\left(\pd{x}{y}\right)_zdy + \left(\pd{z}{y}\right)_xdy \Rightarrow \left(\pd{z}{x}\right)_y\left(\pd{x}{y}\right)_z = - \left(\pd{z}{y}\right)_x $$

Gibbs-Duhem relation

For $f:\RR^n\to\RR$ homogeneous of first order (so that $f(\lambda x_1,\ldots,\lambda x_n)=\lambda f(x_1,\ldots, x_n)$ ), we have $f(x_1,\ldots x_n) = \sum_i^n\pd{f}{x_i}x_i$. (This is sometimes called Euler's theorem.)

$E$, in terms of $S, V, N$, is homogenous of first order.

As a consequence, we have:

$$ E = TS - pV + \mu N $$

(One further consequence is that $G = \mu N$, or $\mu = \frac{G}{N}$.)

But note that this implies:

$$dE = SdT+TdS - Vdp-pdV + Nd\mu + \mu dN$$

Also note that by definition, we have

$$ dE = TdS - pdV + \mu dN $$

As a consequence, we arrive at something surprising, the Gibbs-Duhem relation:

$$ SdT - Vdp + Nd\mu = 0 $$

Interpret this as saying: the differentials of the intensive quantities are not linearly independent.

As an example of $ E = TS - pV + \mu N $ for a concrete model, consider the ideal gas, where this all looks much more surprising:

$$S = k_BN(\log(\frac{V}{N}(\frac{4\pi mE}{3N\hbar^2})^{\frac{3}{2}})+\frac{5}{2}) \Rightarrow E = \exp(\frac{2S}{3k_BN}-\frac{5}{3})\cdot (\frac{N}{V})^{\frac{2}{3}}\frac{3N\hbar^2}{4\pi mE}$$


$$ T = (\frac{\partial E}{\partial S})_{V,N} = \frac{2}{3k_BN}\cdot\exp(\frac{2S}{3k_BN}-\frac{5}{3})\cdot (\frac{N}{V})^{\frac{2}{3}}\frac{3N\hbar^2}{4\pi mE}=\frac{2}{3k_BN}\cdot E$$

$$ p = -(\frac{\partial E}{\partial V})_{S,N} = (\frac{2}{3})\cdot(V^{-1})\cdot \exp(\frac{2S}{3k_BN}-\frac{5}{3})\cdot (\frac{N}{V})^{\frac{2}{3}}\frac{3N\hbar^2}{4\pi mE} = \frac{2}{3V}E$$

$$ \mu = (\frac{\partial E}{\partial N})_{S,V} = -\frac{2SE}{3k_BN^2}+\frac{5NE}{3}=\frac{E}{N}(\frac{5}{3}-\frac{2S}{3k_BN})$$

And therefore:

$$TS-pV+\mu N = \frac{2}{3k_BN}\cdot ES - \frac{2}{3}E+E(\frac{5}{3}-\frac{2S}{3k_BN})=E $$


Thermodynamics is a macroscopic theory, which can be given independently of statistical mechanics, but which statistical mechanics can derive. Below is a quick review of it as a free standing theory, starting from the thermodynamic laws.

To keep things simple, assume the state of a macroscopic system is determined by two quantities: $p$ and $V$. (In general, there may be others too.)

Systems are in equilibrium if their states do not change when placed in thermal contact.

The movement of a system along a path in $p,V$ space corresponds to a quasistatic thermodynamic process. To be quasistatic means that the system stays in equilibrium at all times - otherwise $p$ and $V$ would not be defined. A reversible process is one which is also quasistatic along the path in reverse.

Carnot cycle

As a concrete example of a quasistatic reversible process, consider the Carnot cycle. This is a loop in $p,V$ space, and works as follows. We have two heat baths (systems whose temperature is effectively constant), $A$ and $C$, where $A$ is the hotter of the pair. We also have a cylinder $B$ of an ideal gas, whose pressure can vary.

The process is constructed in four steps (i.e. a piecewise path of 4 segments).

  • 1: start with $B$ in thermal equilibrium with $A$, place $B$ in contact with $A$, and have the gas in $B$ expand such that the temperature of $B$ remains constant while heat is received from $A$. This transfers heat into work, isothermally (fixed temperature).
  • 2: isolate $B$ and let volume increase, so that $B$'s pressure and temperature decrease. This is adiabatic (no heat transfer).
  • 3: place $B$ in contact with $C$, and have the gas compress, adiabatically, giving heat to $C$.
  • 4 thermally isolate $B$ and decrease volume, so pressure and temperature increase.

Note that to return to its starting position, $B$ has to give away some of the heat it took from $A$ back to $C$.

Zeroth Law

Equilibrium is transitive. For systems, $A$, $B$ and $C$ in equilibrium, if $A$ and $B$ are in equilibrium, and likewise $B$ and $C$, then so are $A$ and $C$.

Let $F_{AB}(V_A,p_A,V_B,p_B)=0$ represent the condition of equilibrium, so that solving for $V_B$, we have $V_B = f_{AB}(V_A,p_A,p_B)$ (for some $f$). In other words, $V_A$ is the unique volume such that $A$ and $B$ are in equilibrium. Similarly $V_B = f_{CB}(V_A,p_A,p_C)$, so that


But by the zeroth law, $A$ and $C$ are in equilibrium, so that


So if both of the above equations are equivalent, then the first cannot depend on $p_B$, and so for some function $\theta$, $\theta(V_A,p_A)=\theta(V_C,p_C)$.

In other words, the transitivity of equilibrium (zeroth law) implies the existence of temperature function $\theta$ on $p,V$ space (but doesn't specify this function).

First Law

Energy $E(p,V)$ can be regarded as a function of the state. Consequently, $dE$ can be integrated around any loop in the $p,V$ space to obtain $0$, i.e. $\forall\pi: \oint_{\pi}dE=0$, where $\pi$ is a loop. (In the language of differential geometry, $dE$ is an exact differential 1-form.)

In particular, $dE = \bar{dQ}+\bar{dW}$, so we have that $\oint_{\pi}dE = \oint_{\pi}\bar{dQ} + \oint_{\pi}\bar{dW} = 0$ and consequently:

$$ \oint_{\pi}\bar{dQ} = -\oint_{\pi}\bar{dW} $$

(In the language of differential geometry, $\bar{dQ}$ and $\bar{dW}$ are differential 1-forms which are not exact, and as a result, their integral between two points is path-dependent).

In the general case, neither $\oint_\pi dQ$ nor $\oint_\pi dW$ is $0$. Suppose then that $\oint_\pi dQ > 0$, in other words, that heat is gained by the process. Then work is lost ($\oint_\pi dW < 0$), and for the reverse process $R(\pi)$, work is gained but heat is lost.

Second Law

Best intuition: your train can't run by just leaving behind ice cubes, becoming hotter, and turning that heat into work.

Two equivalent statements:

  • there is no process whose net effect is only to transfer heat from a higher temperature system to a lower temperature system.
  • there is no process whose net effect is only to to transfer heat from a system into work.

To show equivalence from the second to the first, suppose you have a process $\pi$ which transfers heat to work and nothing else. Then use it to extract heat from a hot body (i.e. high temperature system) and feed the resulting work into a fridge, which takes heat from a cool body to the hot body. Then the net result is a violation of the first statement.

Considering a process running between a hot body and a cool body, let $Q_H$ be the heat taken from the hot body by the process, and let $Q_C$ be the heat given to the cool body. Then the efficiency $\eta$ is:

$$\eta = \frac{W}{Q_H} = \frac{Q_H-Q_C}{Q_H} = 1 - \frac{Q_C}{Q_H} $$

So in this setting, the goal is to try to take heat from the hot body, convert as much as possible into work, and give as little back as possible to the cold body.

A direct consequence of the second law is that $\eta < 1$.

A less direct consequence is that a reversible engine has the best efficiency. To see this, suppose we have a process $\tau$ which takes heat from the hot to the cool body, and gives its work to a reversible engine $R(\pi)$ running in reverse. Let $Q_H$ and $Q_C$ be as before, for system $\tau$, and $Q'_H, Q'_C$ be the heat given to the hot bath and taken from the cold bath respectively by $R(\pi)$.

By the second law, considering the two processes as a single process, $Q_H \gt Q'_H $, since otherwise heat would be flowing into the hot body at no work cost (all the work transfer is internal to the joint system, not to anywhere else). Also, we have that $Q_H-Q'_H = Q_C-Q'_C$. But then:

$$\eta(\tau) = 1 - \frac{Q_C}{Q_H} = 1 - \frac{Q_H-Q'_H+Q'_C}{Q_H} = \frac{Q'_H-Q'_C}{Q_H} \leq \frac{Q'_H-Q'_C}{Q'_H} = \eta(R(\pi)) $$

Further, any two reversible engines are equally efficient, when both operating between the same pair of heat baths, by the above argument applied in both directions. Note that efficiency is a function of the temperatures of the two heat baths, $\eta(T_1,T_2)$.

A more intuitive statement/consequence of the second law is that $dS > 0$ for an isolated system. Note that this isn't change in time, but rather change with respect to varying the other macroscopic parameters. So this means that the choice of $p$ and $V$ in an equilibrium state is the one which maximizes $S$. To see an example of why this is true, suppose we have system consisting of a hot body $1$ in contact with a cool body $2$, which can exchange heat but not volume ($dV=0$). Then:

$$ dS \approx dS_1 + dS_2 = \pd{S_1}{U_1}dU_1 + \pd{S_2}{U_2}dU_2 = \pd{S_1}{U_1}dU_1 - \pd{S_2}{U_2}dU_1 = (\frac{1}{T_1}-\frac{1}{T_2})dU_1 $$

Since, by the second law, heat flows from hotter to cooler, and $T_1>T_2$, $dU_1 \lt 0$ and $\frac{1}{T_1} \lt \frac{1}{T_2}$, so $dS > 0$.

Temperature, thermodynamically

Recall that from the zeroth law, we know that temperature is some function on $p,V$ space, i.e. $f(p,V)$.

By then consider a composition of Carnot engines. The first operates between $T_1$ and $T_2$, at $\eta(T_1,T_2)$, and the second between $T_2$ and $T_3$, at efficiency $\eta(T_2,T_3)$. Note that $\eta(T_1,T_2) = 1 - \frac{Q_2}{Q_1} \Rightarrow \frac{Q_2}{Q_1} = 1 - \eta(T_1,T_2)$.

If $Q_1$ is heat taken in of first engine, and $Q_2$ is heat put out, then $Q_2 = Q_1\frac{Q_2}{Q_1} = Q_1(1-\eta(T_1,T_2))$.

Further, suppose the second engine takes the same amount of heat, so that $Q_3 = Q_2(1-\eta(T_2,T_3)) = Q_1(1-\eta(T_1,T_2))(1-\eta(T_2,T_3)) $.

But by composition of heat engines, we also have: $Q_3 = Q_1(1 - \eta(T_1,T_3))$.

Which means that $1 - \eta(T_1,T_3) = (1-\eta(T_1,T_2))(1-\eta(T_2,T_3))$

Solving this functional equation yields: $1 - \eta(T_1,T_2) = \frac{f(T_2)}{f(T_1)}$

But $T$ is only defined up to some function of $p,V$ anyway, so WLOG, we can choose $T$ to be such that:

$$ 1 - \eta(T_1,T_2) = \frac{T_2}{T_1} $$

Note that then $\frac{Q_2}{Q_1} = \frac{T_2}{T_1}$, and so $\frac{Q_1}{Q_2}=\frac{T_1}{T_2}$

Note also that by calculating the efficiency of the Carnot cycle explicitly, we can show that for $T$ defined such that $1 - \eta(T_1,T_2) = \frac{T_2}{T_1}$, it is also the case that $T=\frac{pV}{nk_B}$, where this is the formula for an ideal gas.

Entropy, thermodynamically

For any Carnot cycle (a kind of process), we have $\oint_\pi \frac{\bar{dQ}}{T} = 0$. This is because the change in $Q$ takes place in the two isothermal sections, and amounts to $\frac{Q_1}{T_1}-\frac{Q_2}{T_2} = 0$.

Since a Carnot cycle is homotopic to any other loop, we can smoothly deform between them, preserving $\oint_\pi \frac{\bar{dQ}}{T} = 0$ for any reversible process $\pi$. (Note: there's usually an argument given about chopping corners off the Carnot cycle to deform it, but hopefully what I said amounts to the same thing.)

As a consequence, $dS$ defined as $\frac{\bar{dQ}}{T}$ is an exact differential (it's a theorem that any differential 1-form which is $0$ integrated over loops is exact) and so we obtain a function $S(p,V)$.

Further, we now have $dE = \bar{dQ}+\bar{dW} = TdS -pdV$, which is the familiar form of the energy in terms of entropy from statistical mechanics.

Free energy, thermodynamically

At fixed temperature, $dA = -pdV$, so along a path from state $X$ to state $Y$, $F(Y)-F(X) = -\int_\pi pdV = -W$.

So as observed earlier, the free energy measures the amount of work you can do.

Relationship between statistical mechanics and thermodynamics

The idea is that in the limit of large system size, the variance of the ensemble over the size will tend to $0$. Having $0$ variance means that we can experimentally measure quantities by sampling them, to get accurate measures of expectations, which are precisely physical observables.


The statistical mechanical notion of temperature agrees with thermodynamics in that energy flows from hotter to cooler. Some nice calculus with differentials shows this. Some assumptions first:

  • we have two systems $E_1$ and $E_2$, and that $\frac{1}{T_1}=\frac{dS_1(E_1)}{dE_1}=\frac{dS_1(E_2)}{dE_2}=\frac{1}{T_2}$.
  • $S = S_1+S_2$ (entropy is additive)
  • $dE_1=-dE_2$ (first law/ conversation of energy)
  • $dS > 0$ (second law)


$$ 0 \lt dS = \frac{dS_1}{dE_1}dE_1 + \frac{dS_2}{dE_2}dE_2 = \left( \frac{1}{T_1} - \frac{1}{T_2} \right )dE_1 $$

Then if heat flows from system $1$ to $2$, i.e. $dE_1 \lt 0$, we conclude that $T_2 \lt T_1$, as desired.

Now we show that $\beta$ in the canonical ensemble is $\frac{1}{T}$, taking a discrete ensemble for ease of calculation:

$$ S = -\sum_ip_i\log p_i = -\frac{1}{\tilde{Z}}\sum_ie^{-\beta H_i}\cdot(-\log \tilde{Z} -\beta H_i)$$

$$=\log \tilde{Z}\frac{1}{\tilde{Z}}\sum_ie^{-\beta H_i}+\beta\frac{1}{\tilde{Z}}\sum_i e^{-\beta H_i}\cdot H_i=\log \tilde{Z}+\beta E$$


$$dS = \beta dE + Ed\beta + \frac{\partial \log \tilde{Z}}{\partial \beta}d\beta = \beta dE \Rightarrow \beta = \frac{\partial S}{\partial E} = \frac{1}{T} $$


From a probabilistic perspective, entropy is a measure of the uncertainty of a distribution, but it turns out that this information-theoretic entropy is closely related to the notion of entropy relevant for thermodynamics. Boltzmann's equation gives the latter as:

$$ S(E) = k_B\log(\Omega(E)) $$

where $k_B$, the Boltzmann constant, is a dimension changing constant to account for the fact that temperature and energy are traditionally measured in different dimensions, and where $\Omega(E)$ is the number of states of the system with energy $E$. Note that in these notes I usually assume units in which $k_B=1$, for convenience, and because it's more natural - $k_B$ only arises because temperature is not measured in units of energy.


Once we introduce volume, so that the equation of state is $S=S(E,V)$, then we get a new quantity, pressure:

$$p = T\left(\frac{dS}{dV}\right)_{T} = \left(\frac{dE}{dS}\right)_{V}\left(\frac{dS}{dV}\right)_{T} = -\left(\frac{dE}{dV}\right)_S$$

Here, I am using the second Maxwell relation (see section on thermodynamics).

We note that, holding energy fixed, and assuming that $dV_1=-dV_2$, in a situation where systems $1$ and $2$ can exchange volume (e.g. via a freely moving piston):

$$ 0 \lt dS = \frac{dS_1}{dV_1}dV_1 + \frac{dS_2}{dV_2}dV_2 = \left( \frac{p_1}{T} - \frac{p_2}{T} \right )dV_1 $$

In other words, if system $2$ has higher pressure, system $1$ loses volume to it.

Then note that $dS = \pd{S}{E}dE + \pd{S}{V}dV = TdE-pdV$

As another sanity check on this statistical notion of pressure, consider a volume of gas contained by a piston. Let $x$ denote the position of the piston, so that as $x$ increases, $V$ decreases. Then note that if $p=\frac{F}{A}$, the more intuitive macroscopic definition, then $Fdx = pAdx=-pdV$.

In this vein, we can view $-pdV$ as the work done on the system.

Heat capacity

Heat capacity, $\frac{dE}{dT}$ is important because it is both calculable in your models and actually easy to measure.

$\frac{dS}{dT} = \frac{dS}{dE}\frac{dE}{dT} = \frac{C}{T} \Rightarrow dS = \frac{C}{T}dT$. This is a useful formula, since we now have an expression for the differential of $S$ in terms of things we can measure, so can calculate entropy change in real systems.

Second Law of Thermodynamics

From a statistical mechanical point of view, note that phase space volume is conserved, so in theory, you should always have the same amount of entropy, if you started with a uniform distribution in phase space. The key to contradicting this is to note that there is some limit on resolution, and systems may be chaotic.

So a chaotic system evolving from a small region of phase space may distribute this volume over the space, getting arbitrarily close to any point in the phase space (ergodicity). If your resolution is not infinitely good, coarse graining (replacing the continuous space with discrete chunks which either are or aren't filled) will increase volume, and thus increase entropy.

So in a sense, increase of entropy is a consequence of loss of information.

Phase transitions

Discontinuities or divergences in derivatives of partition function (or other related quantities). Corresponds to qualitative changes in the behaviour of a system, such as solid to liquid.

Ising Model

Known as Markov random fields in probability theory. Useful, because they are analytically solvable (with difficulty) in 1 and 2 dimensions, and exhibit phase transitions. They are also a model of ferromagnetism and lattice gases.

Let $s_i$ be the $i$th spin, taking values in $\{-1,1\}$. Then we can define a Hamiltonian:

$$ H = -J\sum_{\langle s_i,s_j\rangle}s_is_j - B\sum_is_i$$

The first sum is only over neighbours. $J$ is a constant expressing the preference for neighbouring spins to align or disalign, and $B$ represents a magnetic field.

Note also that we assume periodic boundary conditions: the lattice repeats every $n$ spins.

1D: Transfer matrix method

In the 1D case, we can solve exactly even for both $J\neq 0$ and $B\neq 0$, using the transfer matrix approach.

First note that for the matrix $P$ of a linear operator $\phi$ on $\RR^2$ (i.e. a two by two matrix), we have:

$$Tr(P^n) = \sum_{s_1}\ldots\sum_{s_n} P_{s_1s_2}\ldots P_{s_ns_1}$$

So if we choose

$$P_{s_1s_2} = e^{\beta Js_1s_2 - \frac{Bs_1}{2}- \frac{Bs_2}{2} } $$

then we obtain:

$$Tr(P^n) = Z$$

But this is great, because $P$ is symmetric, since spin interaction is symmetric, which means that we can calculate $Tr(P^n)=\left(\lambda_1+\lambda_2\right)^n\to_{n\to\infty} \lambda_1^n$, where $\lambda_1,\lambda_2$ are the eigenvalues (WLOG $\lambda_1\gt \lambda_2$) corresponding to the eigenbasis (which exists, by the spectral theorem, for symmetric matrices).

In code:

In [206]:
spin_values = [1,-1]

beta, J, h = sp.symbols("beta J h")

def make_transfer_matrix(beta):

    return Matrix([[sp.exp(beta*(J+h)),sp.exp(-beta*J)],[sp.exp(-beta*J),sp.exp(beta*(J-h))]])

M = make_transfer_matrix(beta)
eigs = list(M.eigenvals())
$$\left ( \left [ - \frac{1}{2} \sqrt{\left(e^{\beta \left(J - h\right)} + e^{\beta \left(J + h\right)}\right)^{2} - 8 \sinh{\left (2 J \beta \right )}} + \frac{1}{2} e^{\beta \left(J - h\right)} + \frac{1}{2} e^{\beta \left(J + h\right)}, \quad \frac{1}{2} \sqrt{\left(e^{\beta \left(J - h\right)} + e^{\beta \left(J + h\right)}\right)^{2} - 8 \sinh{\left (2 J \beta \right )}} + \frac{1}{2} e^{\beta \left(J - h\right)} + \frac{1}{2} e^{\beta \left(J + h\right)}\right ], \quad \left[\begin{matrix}e^{\beta \left(J + h\right)} & e^{- J \beta}\\e^{- J \beta} & e^{\beta \left(J - h\right)}\end{matrix}\right]\right )$$

Once we have the partition function, we can calculate everything of interest. In particular, we find that the free energy, energy, specific heat, etc are all analytic. There is no phase transition in the 1D Ising model.

Mean field approximation

The 2D model is analytically solved (by Onsager, using representation theory, and more simply later on) but in general, a useful approximate method of solution for Ising models is mean-field theory.

The idea is to make $Z$ tractable to compute by claiming:

$$ H \approx (B+ zJm)\sum_i s_i$$

where $z=2d$, for $d$ the number of dimensions, is the number of neighbours of a given spin, and where $m=\langle s_i \rangle$ is the marginal expectation of the $i$th spin (they all have the same expectation). In other words, there is an effective field $B_{\mathit{eff}}$ obtained by adding a mean field to the magnetic field.

There are various justifications for this, ranging in levels of formality, but the basic idea is that we are assuming that fluctuations around the mean magnetization are vanishingly small.

Now that we've done this, we have a very solvable problem, since now

$$Z = e^{\beta B_{\mathit{eff}}}+ e^{-\beta B_{\mathit{eff}}} = 2^N\cosh^N(\beta B_{\mathit{eff}})$$

The only catch is that $B_{\mathit{eff}}$ is a function of $m$, which we need to solve for. But $m$ is a function of $B_{\mathit{eff}}$. In particular:

$$ m = \sum_{s_i\in\{1,-1\}}p(s_i)s_i = \frac{1}{Z}\left(e^{-\beta B_{\mathit{eff}}}-e^{-\beta B_{\mathit{eff}}}\right) = \frac{2\sinh(\beta B_{\mathit{eff}})}{2\cosh(\beta B_{\mathit{eff}})} = \tanh(\beta B_{\mathit{eff}}) = \tanh(\beta (B + zJm)) $$

The intersection points on the graph below show the values of $m$ for which this holds, varying $\beta zJ$ and assuming $B=0$. Note that when $\beta zJ \gt 1$, we get $3$ intersection points.

The interpretation is that, crossing from above to below the temperature $\frac{1}{\beta zJ}$, there is a phase transition and new equilibrium values of $m$ emerge other than $m=0$. In fact, $m=0$ is unstable below the critical temperature, and given a magnetic field, only one of the two non-origin intersections is stable (the one aligning with the field). The other is metastable: if the system is cold, lack of (speedy) ergodicity will mean that the system could get stuck in the metastable state. This would be a state where all the spins faced opposite to the magnetic field, but since they would have to disagree in alignment in order to change, they remain facing the wrong way, but aligned.

In [207]:
m,zJ = sp.symbols("m zJ")
plot(tanh(m*0.5),m,(m, -2, 2))
plot(tanh(m),m,(m, -2, 2))
plot(tanh(m*5),m,(m, -2, 2))
<sympy.plotting.plot.Plot at 0x114f279b0>

Dynamics and Monte Carlo

Note that we have never specified any notion of time evolution of the Ising model, so statements like the above, where I talk about the system getting stuck, are sort of meaningless. What they're alluding to is a Monte Carlo sampling procedure to approximate $Z$. In particular, the style of sampling is often called Gibbs sampling in the statistical literature: the kernel of the Markov chain involves trying to change a single spin at a time, while keeping the others fixed. If we take this updating procedure to be a model of the time evolution of the system, then this is what results in the "getting stuck" in an aligned state behavior. It's interesting to consider the relationship between time evolution and Markov Chain Monte Carlo sampling.

Renormalization group

It's not actually a group in the mathematical sense. The idea is to take an Ising model, and think about a system consisting of blocks of spins, where each block is regarded as a spin in a new Ising model with a new value of $J$. This process is called renormalization, and can be regarded as a function taking $\beta J$ to a new value. The clever part is that at the critical temperature, renormalization should be a fixed point, essentially because this is what the critical temperature means: the temperature where the fluctuations in the system are infinite, so that everything is scale invariant. The details get pretty complicated.

It turns out that quantities called critical exponents, which are related to derivatives near the critical points of a system, are surprisingly universal. For example, the critical exponents for the Ising model, and the totally different looking Van der Waals fluid model are the same. Critical exponents are calculable using methods related to the normalization group, but again, this is a complicated subject.


Kinetics is the application of actual mechanics to obtain physical behaviors of large systems. So for example, you could take the Maxwell speed distribution of particles in an ideal gas, and look at their collisions with the sides of a box, for example, to work out how much force they should impart, and thus what the pressure is.

Examples of systems

Two spin system

A state $s$ is an array of $N$ Booleans, and the energy is $E=\epsilon\cdot N_T(s)$, where $N_T(s)$ is the number that are true (or spin-up, if you prefer).

Note that in this system, we have fixed number and volume, so don't have to worry about those.

Two Spin System: Microcanonical Ensemble

The microcanonical ensemble is binomial, and so the entropy is $0$ at $E(s)=N_T(s)\epsilon=0$.

Using Stirling's approximation, entropy at $E=\epsilon N_T$: ${N \choose N_T} = \log N! - \log N_T - \log (N-N_T)$ $ \approx N\log N - N - N_T\log N_T + N_T - (N-N_T)\log(N-N_T) + N - N_T$

Symbolically solving, to save effort, and working in units where $k_B=1$, we calculate $S(E)$ and $E$ in terms of $T$

In [208]:
N, NT, E, T, epsilon = sp.symbols("N NT E T epsilon")

def stirling_log_fact(x):
    return x*sp.log(x)-x

S = stirling_log_fact(N)-stirling_log_fact(NT)-stirling_log_fact(N-NT)
# S
SE = S.subs(NT,(E/epsilon))

# give general rules, and then instantiate properly
$$- \frac{E}{\epsilon} \log{\left (\frac{E}{\epsilon} \right )} + N \log{\left (N \right )} - \left(- \frac{E}{\epsilon} + N\right) \log{\left (- \frac{E}{\epsilon} + N \right )}$$
In [209]:
/usr/local/anaconda3/envs/pyt/lib/python3.6/site-packages/sympy/plotting/ UserWarning: The evaluation of the expression is problematic. We are trying a failback method that may still work. Please report this as a bug.
  warnings.warn('The evaluation of the expression is'
<sympy.plotting.plot.Plot at 0x114f86f98>

From this, we see that temperature is negative when $E \gt \frac{N_T}{2}$. Further, negative temperatures are hotter than positive ones, in the sense that heat flows from a system with negative temperature to a system with positive temperature. To see this, suppose we place system $1$ with positive temperature next to system $2$ with negative temperature. $dS \gt 0$, so

$\frac{dS_1}{dE_1} = \frac{1}{T_1} \gt 0$ and $\frac{dS_2}{dE_2} = \frac{1}{T_2} \lt 0$

Then energy must be added to system $1$ to increase energy, and removed from system $2$.

Schottky anomaly

Inverting the equation for $S(E)$ to obtain $E(S)$, we see that:

$$E = \frac{N \epsilon}{e^{\frac{\epsilon}{T}} + 1}$$

Heat capacity is defined as:

$$C = \frac{dE}{dT}$$

Graphing this, as below, shows a non-monotonic curve. This non-monotonicity is known as the Schottky anomaly, anomalous relative to the monotonic heat capacity for many types of systems.

In [210]:
C = diff(solve(eq1,E)[0],T)
$$\frac{N \epsilon^{2} e^{\frac{\epsilon}{T}}}{T^{2} \left(e^{\frac{\epsilon}{T}} + 1\right)^{2}}$$
In [211]:
eq1 = Eq(1/T,sp.simplify(diff(SE,E)))
$$\frac{1}{T} = - \log{\left (E \right )} + \log{\left (- E + 1 \right )}$$
In [212]:
<sympy.plotting.plot.Plot at 0x11526f710>

Two Spin System: Canonical Ensemble

First we calculate $Z$:

In [213]:
Z = (sp.exp(-0/T)+sp.exp(-epsilon/T))**N
$$\frac{N \epsilon}{e^{\frac{\epsilon}{T}} + 1}$$

$$\langle E \rangle = \frac{N \epsilon}{e^{\frac{\epsilon}{T}} + 1}$$

Note that this agrees with the same result obtained from the microcanonical ensemble and a Legendre transform (see above). Good sanity check.

Classical Harmonic Oscillator

Microcanonical ensemble

$1 = \int d^{6N}\mu \rho = C'\cdot A$, for $A=\int_{E \leq \sum_{i}^{3N}\frac{p_i^2}{2m}+\frac{kq_i^2}{2}\leq E+\delta E} dq_1...q_n,p_1...p_n$

To evaluate this integral, we can note that for $\tilde{\Omega}(E)=\int_{\sum_{i}^{3N}\frac{p_i^2}{2m}+\frac{kq_i^2}{2}\leq E} dq_1...q_n,p_1...p_n$, we have that $A=\tilde{\Omega}(E+\delta E)-\tilde{\Omega}(E)$.

$\tilde{\Omega}(E)$ is can be obtained by the formula for the volume of a sphere, and a change of variables to $(\frac{p_1}{\sqrt{m}}...\frac{p_n}{\sqrt{m}},\sqrt{k}q_1...\sqrt{k}q_n)$, which incurs a Jacobian determinant of $(\frac{m}{k})^{\frac{3N}{2}}$. In particular:

$$\tilde{\Omega}(E) = \int_{\sum_{i}^{3N}(\frac{p_i}{\sqrt{m}})^2+(\sqrt{k}q_i)^2\leq 2E} dq_1...dq_n,dp_1...dp_n = (\frac{m}{k})^{\frac{3N}{2}} \int_{\sum_{i}^{3N}(\frac{p_i}{\sqrt{m}})^2+(\sqrt{k}q_i)^2\leq 2E} d\frac{p_1}{\sqrt{m}}...d\frac{p_n}{\sqrt{m}},d\sqrt{k}q_1...d\sqrt{k}q_n = (\frac{m}{k})^{\frac{3N}{2}} Sph(6N,\sqrt{2E})$$


So in the limit of small $\Delta E$, we have that $\frac{A}{\Delta E}=\frac{\partial A}{\partial E} = \sqrt{\frac{m}{k}}^{3N}\frac{\pi^{3N}(2E)^{3N}}{\Gamma(3N+1)}=\frac{6N\sqrt{\frac{m}{k}}^{3N}\pi^{3N}(2E)^{3N-1}}{\Gamma(3N+1)}$

Then, again in the limit of small $\Delta E$, $C'=\frac{1}{\Delta E}\frac{\Gamma(3N+1)}{6N\sqrt{\frac{m}{k}}^{3N}\pi^{3N}(2E)^{3N-1}}$

Canonical ensemble

Let's consider a single, 1D classical harmonic oscillator. I use $\mu$ to denote a point in phase space, with coordinates $(p,q)$. Assume units where $k_B$ is $1$.


$$Z = \int e^{-\beta H(\mu)}d\mu = \left(\int_{\RR} dp e^{-\beta T(p)}\right)\cdot\left( \int_{\RR} e^{-\beta V(q)} \right) = \sqrt{\frac{2m\pi}{\beta}}\cdot \sqrt{\frac{2\pi}{\beta k}} = \frac{2\pi}{\omega\beta}$$

where $\omega$ is $\sqrt{\frac{k}{m}}$, which is the frequency of the oscillator.

So $E = -\frac{\partial \log Z(\beta)}{\partial\beta} = \frac{1}{\beta} \Rightarrow E = T$

This means that for any spring constant, the oscillator is equal to the temperature. This turns out not to be the correct model for nature: it turns out that at low temperatures and high spring constants, heat capacity decreases. Explaining this requires quantum mechanics.

Classical ideal gas

Let $m$ be the mass of a single particle, and assume we have $N$ non-interacting particles trapped in a volume $V$. Let $U$ be the potential function, so that $U(q_i)=0$ if the particle is in the box, and $\infty$ otherwise.

Fixed energy (microcanonical)

First choose some value of $E$. Then:

$$N!h^{3N}\rho(p,q) = \frac{1}{\Omega(E)}$$

where $$\Omega(E) = \int_{E \leq H(p,q) \leq E + \Delta E} dp_i\ldots dp_n,dq_i\ldots dq_n = \int_{E \leq \sum_i^{3N}\frac{p_i^2}{2m} + U(q_i) \leq E + \Delta E} dp_i\ldots dp_n,dq_i\ldots dq_n $$

$$ = \int_{\sum_i^{3N}\frac{p_i^2}{2m} + U(q_i) \leq E + \Delta E} dp_i\ldots dp_n,dq_i\ldots dq_n - \int_{\sum_i^{3N}\frac{p_i^2}{2m} + U(q_i) \leq E} dp_i\ldots dp_n,dq_i\ldots dq_n = \tilde{\Omega}(E+\Delta E) - \tilde{\Omega}(E) $$

where $\tilde{\Omega}(J) = \int_{\sum_i^{3N}\frac{p_i^2}{2m} + U(q_i) \leq J} \frac{1}{\Omega(E)} dp_i\ldots dp_n,dq_i\ldots dq_n $.

Note that

$$\tilde{\Omega}(J) = V^N\cdot Vol(S_{3N,\sqrt{J}})$$

where the volume of the d-sphere with radius $R$ is $Vol(S_{d,R})=\frac{\pi^{\frac{d}{2}}\sqrt{2mE}^{d}}{\Gamma(\frac{d}{2}+1)}$

Since $\Delta E$ is differentially small, we have:

$$ \frac{1}{\Delta E}N!h^{3N}\rho(p,q) = \frac{\tilde{\Omega}(E+\Delta E) - \tilde{\Omega}(E) }{\Delta E} \to \tilde{\Omega}'(E) $$


$$\tilde{\Omega}'(E) = \frac{d}{dE}V^N\frac{\pi^{\frac{3N}{2}}\sqrt{2mE}^{3N}}{\Gamma(\frac{3N}{2}+1)} = \frac{3N}{2}\cdot 2m\cdot V^N\frac{\pi^{\frac{3N}{2}}(2mE)^{\frac{3N}{2}-1}}{\Gamma(\frac{3N}{2}+1)} = \frac{3N}{2E} \cdot V^N\frac{(2\pi mE)^{\frac{3N}{2}}}{\Gamma(\frac{3N}{2}+1)} = \frac{1}{E} \cdot V^N\frac{(2\pi mE)^{\frac{3N}{2}}}{\Gamma(\frac{3N}{2})} $$

And putting it all together:

$$ \rho(p,q) = \Delta E\frac{1}{EN!h^{3N}} \cdot V^N\frac{(2\pi mE)^{\frac{3N}{2}}}{\Gamma(\frac{3N}{2})} $$

Fixed temperature (canonical)

Since particles are independent, $Z=\frac{Z_i^N}{N!}$

$$Z_1(\beta) = \frac{1}{h^{3}} \int e^{-\beta H(p,q)} dp_i\ldots dp_n,dq_i\ldots dq_n = V^N\cdot \left(\int_{-\infty}^{\infty} e^{\frac{-\beta p_i^2}{2m}}\right)^{3} = V\cdot\left(\frac{\sqrt{2\pi mT}}{h}\right)^3 = V\cdot\left(\frac{\sqrt{mT}}{\sqrt{2\pi}\hbar}\right)^3 = V\cdot\left(\frac{mT}{2\pi\hbar^2}\right)^{\frac{3}{2}} $$

Defining the thermal de-Broglie wavelength as:

$$\lambda = \left(\frac{2\pi\hbar^2}{mT}\right)^{\frac{1}{2}}$$

with dimensions, where $E=ML^2T^{−2}$, of $\left(E^2T^2M^{-1}E^{-1}\right)^{\frac{1}{2}} = \left(ML^2T^{−2}T^2M^{-1}\right)^{\frac{1}{2}}=L $

we then have:

$$ Z_i(\beta) = \frac{V}{\lambda^3}$$

$$ p = -\pd{A}{V} = -\pd{-T\log Z}{V} = NT\pd{}{V}\log V = \frac{NT}{V}$$

(This is the usually seen as $pV = Nk_BT$.)

$$ E = -\pd{\log Z}{\beta} = \frac{3}{2}NT$$

Then note that energy per particle is then $\frac{3}{2}T=\frac{\dot{q}^2}{2m}$, where $\dot{q}$ is the momentum of a single particle, so that $\sqrt{3Tm}=\dot{q}$

But recalling that $\lambda_{dB}=\frac{h}{p}$ is the de Broglie wavelength, we see why $\lambda$ is called the thermal de Broglie wavelength, since it is, up to a constant, the wavelength of the average particle.

Equation of state

The equation of state for the ideal gas is known as the Sackur-Tetrode formula. We can derive it as follows:

$$ S = -\pd{A}{T} = \pd{}{T} T\left(\log \left(\frac{V}{\lambda^3}\right)^N - \log N!\right) = \pd{}{T} T\left(\log \left(\frac{V}{\lambda^3}\right)^N - N\log N + N \right) = \pd{}{T} TN\left(\log \left(\frac{V}{\lambda^3}\right) - \log N + 1 \right) $$

$$ = \pd{}{T} TN\left(\log \left(\frac{V}{N\lambda^3}\right) + 1 \right) $$

Now note that $\pd{}{T} N\left(\log \left(\frac{V}{N\lambda^3}\right) + 1 \right) = -N\pd{}{T}\log(\lambda^3)=-N\pd{}{T}\log\left(\frac{2\pi\hbar^2}{mT}\right)^{\frac{3}{2}}=\frac{3}{2}N$


$$ S = \pd{}{T} TN\left(\log \left(\frac{V}{N\lambda^3}\right) + 1 \right) = N\left(\log \left(\frac{V}{N\lambda^3}\right) + \frac{5}{2} \right) $$

Quantum Harmonic Oscillator

The state space is $\mathbb{N}$, the natural numbers, with $E(n)=(n+\frac{1}{2})\hbar\omega$. Then:

$$Z = \sum_{n=0}^{\infty} e^{-\beta(n+\frac{1}{2})\hbar\omega} = e^{-\beta\frac{1}{2}\hbar\omega}\sum_{n=0}^{\infty} \left(e^{-\beta \hbar\omega}\right)^n = e^{-\beta\frac{1}{2}\hbar\omega} \frac{1}{1- e^{-\beta \hbar\omega}} = \frac{1}{e^{\frac{\beta\hbar\omega}{2}}- e^{-\frac{\beta \hbar\omega}{2}}} $$

Proceeding in the normal fashion, we obtain as energy and heat capacity:

$$ E = \hbar\omega\left( \frac{1}{2} + \frac{1}{e^{\beta\hbar\omega}-1} \right) $$

$$ C_v = \frac{\omega ^2 \hbar ^2 e^{\frac{\omega \hbar }{T}}}{T^2 \left(e^{\frac{\omega \hbar }{T}}-1\right)^2} $$

In [214]:
# C_v plot
<sympy.plotting.plot.Plot at 0x115229e80>

At low $T$, we can see that $E$ becomes increasingly constant (in $T$), but at high $T$, becomes linear in $T$ (you need to consider the locally linear region near $\beta=0$, since otherwise we get division by $0$. This is the thermodynamic behaviour predicted by the corresponding classical model.

Further note that $\omega = \sqrt{\frac{k}{m}}$, so similar remarks hold for oscillators with a strong spring constant, for example.

Note: one example of a physically useful consequence of this: a gas of diatomic molecules should have vibrational energy from the molecules (which are harmonic oscillators), but if their spring constant is high enough relative to the temperature of the system, we can treat them as frozen. More generally, we get locking of certain degrees of freedom. This explains why macroscopic systems have lower energy than classical statistical mechanics predicts, and similarly the heat capacity of solids (lattices of harmonic oscillators) was overestimated classically. Another example is a vibrating string. If each complex exponential (harmonic oscillator) in its Fourier series had energy equal to the temperature, the total energy would be unbounded. We need instead to say that the high frequency oscillators have exponentially small temperature.

Models of solids

For simplicity, assume we have a 1D lattice (same ideas apply in 3D). The configuration of the lattice can be represented by a vector $\textbf{q}$, giving the displacement from equilibrium of each atom. Moreover assume periodic boundary conditions so that $q_{N+1}=q_N$. Kinetic energy is then straightforward, namely $\sum_i^N \frac{p_i^2}{2m}$.

The problem is the potential, which could be enormously complex. What we can do is to consider the low temperature case, where displacements are small, and so a Taylor expansion is justified, with:

$$ U(q) \approx U(q_0) +\left(q-q_0\right)\left(\nabla U|_{q_0}\right) + \left(q-q_0\right)^T\left(\nabla^2U|_{q_0} \right)\left(q-q_0\right) $$

We can then diagonalize $H=\left(\nabla^2U|_{q_0} \right)$ to obtain a new formula for $N$ independent oscillators, which allows us to express the partition function as a sum of independent harmonic oscillators (either quantum or classical).

The next step is to approximate the sum by an integral, which requires us to provide a density $\sigma(\omega)d\omega$ over frequencies for our harmonic oscillators. Notice that this is a continuum approximation of the solid.

The Debye model provides one choice of $\sigma$.

The much simpler Einstein model of a solid, which qualitatively captures the fact that the heat capacity drops with temperature, is to treat a solid as a set of $3N$ harmonic oscillators, all sharing the same $\omega$.

Gas with quantized energy levels

(These lecture notes were helpful here)

Suppose that we have $N$ particles, energy levels denoted $e(i)$, and number of particles per level denoted $R(n_i)$ (in a full state of the system $R$. Then $N=\sum_ie(i)R_n(i)$ in that state. Assume that temperature (and chemical potential) are fixed.

$$ \langle n_j \rangle = \frac{1}{\sum_R e^{-\beta\sum_ie(i)R(n_i)}}\sum_R R(n_j) e^{-\beta\sum_ie(i)R(n_i)} = -\frac{1}{\beta}\pd{}{e(j)}\log Z $$

Then under different assumptions we obtain different values of $Z$, and can proceed from there.

With distinguishable particles (Maxwell-Boltzmann statistics)

If the particles are distinguishable, we have:

$$ \log Z = \log \xi^N = N \log \xi $$

where $\xi = \sum_ie^{-\beta e(i)}$


$$ \langle n_j \rangle = -\frac{1}{\beta}\pd{}{e(j)}\log Z = -\frac{1}{\beta}\pd{}{e(j)}N\log \sum_ie^{-\beta e(i)} = N \frac{e^{-\beta e(j)}}{\sum_ie^{-\beta e(i)}} $$

With indistinguishable bosons (Planck)

With indistinguishable bosons with no fixed number (Bose-Einstein statistics)

Here we have to use the Grand Canonical Ensemble, namely:

with $R(E) = \sum_ie(i)R(n_i)$ and $R(N) = \sum_i R(n_i)$:

$$\mathcal{Z} = \sum_R e^{-\beta (R(e) - \mu R(N) )} = \sum_R e^{-\beta (\sum_i e(i)R(n_i) - \mu R(n_i) )} = \sum_{n_1=0}^{\infty} e^{-\beta(e(1)-\mu)n_1}\cdot\sum_{n_2=0}^{\infty} e^{-\beta(e(2)-\mu)n_2}\ldots $$

$$ = \frac{1}{1- e^{-\beta(e(1)-\mu)}}\cdot \frac{1}{1- e^{-\beta(e(2)-\mu)}}\ldots $$


$$ \log \mathcal{Z} = -\sum_{r=1}^{\infty} \log(1 - e^{-\beta(e(1)-\mu)}) $$


$$ \langle N \rangle = \frac{1}{\beta}\pd{}{\mu}\log \mathcal{Z} = \sum_{r=1}^{\infty} \frac{e^{-\beta(e(r)-\mu)}}{(1 - e^{-\beta(e(r)-\mu)})} = \sum_{r=1}^{\infty} \frac{1}{e^{\beta(e(r)-\mu)} - 1} $$

Similarly: $\langle n_j \rangle = \frac{1}{e^{\beta(e(j)-\mu)} - 1} $

Note that since $\forall j: \langle n_j \rangle \gt 0$, we have $\forall j: E_j\gt \mu$. In particular, as $\mu$ approaches the lowest energy $E_0$, $\langle n_0 \rangle \to \infty$ and so almost all of the particles reside in the ground state. This is Bose-Einstein condensation, and leads to weird macroscopic effects of quantum behaviour.

With indistinguishable fermions with no fixed number (Fermi-Dirac statistics)

Now assume the particles are Fermions. Proceeding in the same fashion, but summing $n_i$ up to $1$, not to infinity, in order to respect the Pauli exclusion principle for Fermions, we obtain:

$$ \langle n_j \rangle = \frac{1}{e^{\beta(e(j)-\mu)} + 1} $$

This is a useful model of electron and hole density in semiconductors. Note that $\langle n_j \rangle \leq 1$, as we would want.

In [218]:
x = sp.symbols("x")
plot(1 / (sp.exp(x) + 1))
<sympy.plotting.plot.Plot at 0x1150320f0>

Relaxation from quantum to classical

Assume the temperature is large in either a boson or fermion gas, but fix $N$. Then high energy states become popular, and $e^{-\beta(e(j)-\mu)}$ must become small, in order to keep $N$ constant.

$$\langle n_j \rangle = \frac{1}{e^{\beta(e(j)-\mu)} \pm 1} \to e^{-\beta(e(j)-\mu)} $$

But note that

$$ \langle N \rangle = \sum_i \langle n_i \rangle = \sum_ie^{-\beta(e(j)-\mu)} = e^{\beta\mu}\sum_ie^{-\beta e(j)} $$

$$ \Rightarrow \langle n_i \rangle = e^{-\beta(e(j)-\mu)} = e^{-\beta e(j)}e^{\beta\mu} = e^{-\beta e(j)}\frac{\langle N \rangle}{\sum_ie^{-\beta e(j)}} $$

Further, $\langle N \rangle$ converges to $N$, and so we get the Maxwell-Boltzmann statistics.

Simple random walk

The simplest random walk is a series $n$ of independent unbiased Bernoulli trials, so if $X\sim Bin(p=0.5,n=n)$, then $Y = 2X-1$ is the random variable corresponding to the observable of length of chain, if we're modeling a polymer chain for example.

Deriving the diffusion equation

For a simple random walk, we can describe not only the macroscopic equilibrium state, but also the approach to equilibrium, which is diffusion.

For a density $\rho(x,t)$, the following equation describes diffusion:

$$ \frac{\partial \rho}{\partial t} = a\frac{\partial^2 \rho}{\partial x^2} $$

This can be seen to be the continuum limit of random walks. The following derivation is from Sethna's excellent Statistical Mechanics: Entropy, Order Parameters, and Complexity.

Let $X(t)$ be the random variable (with density $\rho(x, t)$) for a particle's position at time $t$, and assume a discrete time step $\Delta t$, so that $X(t+\Delta t)=X(t)+L$, where $L$ is a random variable representing a step, with $E[L]=0$ and density $\chi$.

The density $\rho(x, t+\Delta t)$ is then the sum of the random variables $X(t)$ and $L$, which is the convolution of their densities:

$$\rho(x, t+\Delta t) = \int_{-\infty}^{\infty}\rho(x-z,t)\cdot \chi(z)dz$$

If we assume that the step size is very small compared to the varying of $\rho$ with $x$ then we can Taylor expand:

$$\rho(x, t+\Delta t) = \int_{-\infty}^{\infty}\left( \rho(x,t)-\frac{\partial \rho}{\partial x}z + \frac{\partial^2\rho}{\partial x^2}\frac{z^2}{2} \right)\cdot \chi(z)dz$$

$$ = \rho(x,t)\int_{-\infty}^{\infty}\chi(z)dz - \frac{\partial \rho}{\partial x}\int_{-\infty}^{\infty}\left(z \right)\cdot \chi(z)dz + \frac{1}{2}\frac{\partial^2\rho}{\partial x^2}\int_{-\infty}^{\infty}\left(z^2 \right)\cdot \chi(z)dz $$

$$ = \rho + \frac{1}{2}\frac{\partial^2\rho}{\partial x^2}\cdot Var[L] $$

Now suppose that we also assume that $\rho$ changes very little in each time step. Then, in the limit we can equate $\rho(x,t+\Delta t)-\rho(x,t)$ with $\frac{\partial\rho}{\partial t}\Delta t$

Putting together these two interpretations for $\rho(x,t+\Delta t)$ gives:

$$\frac{\partial\rho}{\partial t}= \frac{Var[L]}{2\Delta t} \frac{\partial^2\rho}{\partial x^2} $$

which not only derives the diffusion equation, but gives us the form of $a$.

Conserved current

A closely connected relation between discrete and continuum systems is as follows: let $\rho(x)\Delta x$ be the density of the quantity of interest, and $J(x)$ the rate of this quantity moving through $x$. Then

$$\frac{d \rho(x)\Delta x}{dt} = J(x)-J(x+\Delta x) \Rightarrow \frac{d \rho(x)}{dt} = -\frac{J(x+\Delta x)-J(x)}{\Delta x} \to -\frac{dJ}{dx} $$

Given conserved current and a further assumption that $J=-a\frac{\partial \rho}{\partial t}$, we recover the diffusion equation. This second assumption is, Sethna notes, sensible in the context of perturbations of equilibrium systems, which can be studied more formally.