MLMC: Machine Learning Monte Carlo for Lattice Gauge Theory

Sam Foreman

MLMC: Machine Learning Monte Carlo

Sam Foreman

[email protected]

ALCF

2023-07-31

Hamiltonian Monte Carlo (HMC)

Want to (sequentially) construct a chain of states: $x_{0} \rightarrow x_{1} \rightarrow x_{i} \rightarrow \cdots \rightarrow x_{N}\hspace{10pt}$

such that, as $N \rightarrow \infty$ : $\left\{x_{i}, x_{i+1}, x_{i+2}, \cdots, x_{N} \right\} \xrightarrow[]{N\rightarrow\infty} p(x) \propto e^{-S(x)}$

🪄 Trick

Introduce fictitious momentum $v \sim \mathcal{N}(0, \mathbb{1})$
- Normally distributed independent of $x$ , i.e. $\begin{align*} p(x, v) &\textcolor{#02b875}{=} p(x)\,p(v) \propto e^{-S{(x)}} e^{-\frac{1}{2} v^{T}v} = e^{-\left[S(x) + \frac{1}{2} v^{T}{v}\right]} \textcolor{#02b875}{=} e^{-H(x, v)} \end{align*}$

Hamiltonian Monte Carlo (HMC)

Idea: Evolve the $(\dot{x}, \dot{v})$ system to get new states $\{x_{i}\}$ ❗
Write the joint distribution $p(x, v)$ : $p(x, v) \propto e^{-S[x]} e^{-\frac{1}{2}v^{T} v} = e^{-H(x, v)}$

🔋 Hamiltonian Dynamics

$H = S[x] + \frac{1}{2} v^{T} v \Longrightarrow$ $\dot{x} = +\partial_{v} H, \,\,\dot{v} = -\partial_{x} H$

Figure 1: Overview of HMC algorithm

Leapfrog Integrator (HMC)

🔋 Hamiltonian Dynamics

$\left(\dot{x}, \dot{v}\right) = \left(\partial_{v} H, -\partial_{x} H\right)$

🐸 Leapfrog Step

input $\,\left(x, v\right) \rightarrow \left(x', v'\right)\,$ output

$\begin{align*} \tilde{v} &:= \textcolor{#F06292}{\Gamma}(x, v)\hspace{2.2pt} = v - \frac{\varepsilon}{2} \partial_{x} S(x) \\ x' &:= \textcolor{#FD971F}{\Lambda}(x, \tilde{v}) \, = x + \varepsilon \, \tilde{v} \\ v' &:= \textcolor{#F06292}{\Gamma}(x', \tilde{v}) = \tilde{v} - \frac{\varepsilon}{2} \partial_{x} S(x') \end{align*}$

⚠️ Warning!

Resample $v_{0} \sim \mathcal{N}(0, \mathbb{1})$
at the beginning of each trajectory

Note: $\partial_{x} S(x)$ is the force

HMC Update

We build a trajectory of $N_{\mathrm{LF}}$ leapfrog steps¹ $\begin{equation*} (x_{0}, v_{0})% \rightarrow (x_{1}, v_{1})\rightarrow \cdots% \rightarrow (x', v') \end{equation*}$
And propose $x'$ as the next state in our chain

$\begin{align*} \textcolor{#F06292}{\Gamma}: (x, v) \textcolor{#F06292}{\rightarrow} v' &:= v - \frac{\varepsilon}{2} \partial_{x} S(x) \\ \textcolor{#FD971F}{\Lambda}: (x, v) \textcolor{#FD971F}{\rightarrow} x' &:= x + \varepsilon v \end{align*}$

We then accept / reject $x'$ using Metropolis-Hastings criteria,
$A(x'|x) = \min\left\{1, \frac{p(x')}{p(x)}\left|\frac{\partial x'}{\partial x}\right|\right\}$

L2HMC: Generalizing the MD Update

Introduce $d \sim \mathcal{U}(\pm)$ to determine the direction of our update
1. $\textcolor{#07B875}{v'} =$ $\Gamma^{\pm}$ $({x}, \textcolor{#07B875}{v})$ $\hspace{46pt}$ update $v$
2. $\textcolor{#AE81FF}{x'} =$ $x_{B}$ $\,+\,$ $\Lambda^{\pm}$ $($ $x_{A}$ $, {v'})$ $\hspace{10pt}$ update first half: $x_{A}$
3. $\textcolor{#AE81FF}{x''} =$ $x'_{A}$ $\,+\,$ $\Lambda^{\pm}$ $($ $x'_{B}$ $, {v'})$ $\hspace{8pt}$ update other half: $x_{B}$
4. $\textcolor{#07B875}{v''} =$ $\Gamma^{\pm}$ $({x''}, \textcolor{#07B875}{v'})$ $\hspace{36pt}$ update $v$

Resample both $v\sim \mathcal{N}(0, 1)$ , and $d \sim \mathcal{U}(\pm)$ at the beginning of each trajectory
- To ensure ergodicity + reversibility, we split the $x$ update into sequential (complementary) updates
Introduce directional variable $d \sim \mathcal{U}(\pm)$ , resampled at the beginning of each trajectory:
- Note that $\left(\Gamma^{+}\right)^{-1} = \Gamma^{-}$ , i.e. $\Gamma^{+}\left[\Gamma^{-}(x, v)\right] = \Gamma^{-}\left[\Gamma^{+}(x, v)\right] = (x, v)$

Figure 5: Generalized MD update with $\Lambda_{\theta}^{\pm}$ , $\Gamma_{\theta}^{\pm}$ **invertible NNs**

L2HMC Update

👨‍💻 Algorithm

input: $x$
- Resample: $\textcolor{#07B875}{v} \sim \mathcal{N}(0, \mathbb{1})$ ; $\,\,{d\sim\mathcal{U}(\pm)}$
- Construct initial state: $\textcolor{#939393}{\xi} =(\textcolor{#AE81FF}{x}, \textcolor{#07B875}{v}, {\pm})$
forward: Generate proposal $\xi'$ by passing initial $\xi$ through $N_{\mathrm{LF}}$ leapfrog layers
$\textcolor{#939393} \xi \hspace{1pt}\xrightarrow[]{\tiny{\mathrm{LF} \text{ layer}}}\xi_{1} \longrightarrow\cdots \longrightarrow \xi_{N_{\mathrm{LF}}} = \textcolor{#f8f8f8}{\xi'} := (\textcolor{#AE81FF}{x''}, \textcolor{#07B875}{v''})$
- Accept / Reject: $\begin{equation*} A({\textcolor{#f8f8f8}{\xi'}}|{\textcolor{#939393}{\xi}})= \mathrm{min}\left\{1, \frac{\pi(\textcolor{#f8f8f8}{\xi'})}{\pi(\textcolor{#939393}{\xi})} \left| \mathcal{J}\left(\textcolor{#f8f8f8}{\xi'},\textcolor{#939393}{\xi}\right)\right| \right\} \end{equation*}$
backward (if training):
- Evaluate the loss function¹ $\mathcal{L}\gets \mathcal{L}_{\theta}(\textcolor{#f8f8f8}{\xi'}, \textcolor{#939393}{\xi})$ and backprop
return: $\textcolor{#AE81FF}{x}_{i+1}$
Evaluate MH criteria $(1)$ and return accepted config, $\textcolor{#AE81FF}{{x}_{i+1}}\gets \begin{cases} \textcolor{#f8f8f8}{\textcolor{#AE81FF}{x''}} \small{\text{ w/ prob }} A(\textcolor{#f8f8f8}{\xi''}|\textcolor{#939393}{\xi}) \hspace{26pt} ✅ \\ \textcolor{#939393}{\textcolor{#AE81FF}{x}} \hspace{5pt}\small{\text{ w/ prob }} 1 - A(\textcolor{#f8f8f8}{\xi''}|{\textcolor{#939393}{\xi}}) \hspace{10pt} 🚫 \end{cases}$

Figure 6: **Leapfrog Layer** used in generalized MD update

4D $SU(3)$ Model

🔗 Link Variables

Write link variables $U_{\mu}(x) \in SU(3)$ :

$\begin{align*} U_{\mu}(x) &= \mathrm{exp}\left[{i\, \textcolor{#AE81FF}{\omega^{k}_{\mu}(x)} \lambda^{k}}\right]\\ &= e^{i \textcolor{#AE81FF}{Q}},\quad \text{with} \quad \textcolor{#AE81FF}{Q} \in \mathfrak{su}(3) \end{align*}$

where $\omega^{k}_{\mu}(x)$ $\in \mathbb{R}$ , and $\lambda^{k}$ are the generators of $SU(3)$

🏃‍♂️‍➡️ Conjugate Momenta

Introduce $P_{\mu}(x) = P^{k}_{\mu}(x) \lambda^{k}$ conjugate to $\omega^{k}_{\mu}(x)$

🟥 Wilson Action

$S_{G} = -\frac{\beta}{6} \sum \mathrm{Tr}\left[U_{\mu\nu}(x) + U^{\dagger}_{\mu\nu}(x)\right]$

where $U_{\mu\nu}(x) = U_{\mu}(x) U_{\nu}(x+\hat{\mu}) U^{\dagger}_{\mu}(x+\hat{\nu}) U^{\dagger}_{\nu}(x)$

HMC: 4D $SU(3)$

Hamiltonian: $H[P, U] = \frac{1}{2} P^{2} + S[U] \Longrightarrow$

$U$ update: $\frac{d\omega^{k}}{dt} = \frac{\partial H}{\partial P^{k}}$ $\frac{d\omega^{k}}{dt}\lambda^{k} = P^{k}\lambda^{k} \Longrightarrow \frac{dQ}{dt} = P$ $\begin{align*} Q(\textcolor{#FFEE58}{\varepsilon}) &= Q(0) + \textcolor{#FFEE58}{\varepsilon} P(0)\Longrightarrow\\ -i\, \log U(\textcolor{#FFEE58}{\varepsilon}) &= -i\, \log U(0) + \textcolor{#FFEE58}{\varepsilon} P(0) \\ U(\textcolor{#FFEE58}{\varepsilon}) &= e^{i\,\textcolor{#FFEE58}{\varepsilon} P(0)} U(0)\Longrightarrow \\ &\hspace{1pt}\\ \textcolor{#FD971F}{\Lambda}:\,\, U \longrightarrow U' &:= e^{i\varepsilon P'} U \end{align*}$

$P$ update: $\frac{dP^{k}}{dt} = - \frac{\partial H}{\partial \omega^{k}}$ $\frac{dP^{k}}{dt} = - \frac{\partial H}{\partial \omega^{k}} = -\frac{\partial H}{\partial Q} = -\frac{dS}{dQ}\Longrightarrow$ $\begin{align*} P(\textcolor{#FFEE58}{\varepsilon}) &= P(0) - \textcolor{#FFEE58}{\varepsilon} \left.\frac{dS}{dQ}\right|_{t=0} \\ &= P(0) - \textcolor{#FFEE58}{\varepsilon} \,\textcolor{#E599F7}{F[U]} \\ &\hspace{1pt}\\ \textcolor{#F06292}{\Gamma}:\,\, P \longrightarrow P' &:= P - \frac{\varepsilon}{2} F[U] \end{align*}$

HMC: 4D $SU(3)$

Momentum Update: $\textcolor{#F06292}{\Gamma}: P \longrightarrow P' := P - \frac{\varepsilon}{2} F[U]$
Link Update: $\textcolor{#FD971F}{\Lambda}: U \longrightarrow U' := e^{i\varepsilon P'} U\quad\quad$
We maintain a batch of Nb lattices, all updated in parallel
- $U$ .dtype = complex128
- $U$ .shape
  = [Nb, 4, Nt, Nx, Ny, Nz, 3, 3]

Networks 4D $SU(3)$

$U$ -Network:

UNet: $(U, P) \longrightarrow \left(s_{U},\, t_{U},\, q_{U}\right)$

$P$ -Network:

PNet: $(U, P) \longrightarrow \left(s_{P},\, t_{P},\, q_{P}\right)$

$\uparrow$
let’s look at this

$P$ -`Network` (pt. 1)

input¹: $\hspace{7pt}\left(U, F\right) := (e^{i Q}, F)$ $\begin{align*} h_{0} &= \sigma\left( w_{Q} Q + w_{F} F + b \right) \\ h_{1} &= \sigma\left( w_{1} h_{0} + b_{1} \right) \\ &\vdots \\ h_{n} &= \sigma\left(w_{n-1} h_{n-2} + b_{n}\right) \\ \textcolor{#FF5252}{z} & := \sigma\left(w_{n} h_{n-1} + b_{n}\right) \longrightarrow \\ \end{align*}$

output²: $\hspace{7pt} (s_{P}, t_{P}, q_{P})$
- $s_{P} = \lambda_{s} \tanh(w_s \textcolor{#FF5252}{z} + b_s)$
- $t_{P} = w_{t} \textcolor{#FF5252}{z} + b_{t}$
- $q_{P} = \lambda_{q} \tanh(w_{q} \textcolor{#FF5252}{z} + b_{q})$

$P$ -`Network` (pt. 2)

Use $(s_{P}, t_{P}, q_{P})$ to update $\Gamma^{\pm}: (U, P) \rightarrow \left(U, P_{\pm}\right)$ ¹:
- forward $(d = \textcolor{#FF5252}{+})$ : $\Gamma^{\textcolor{#FF5252}{+}}(U, P) := P_{\textcolor{#FF5252}{+}} = P \cdot e^{\frac{\varepsilon}{2} s_{P}} - \frac{\varepsilon}{2}\left[ F \cdot e^{\varepsilon q_{P}} + t_{P} \right]$
- backward $(d = \textcolor{#1A8FFF}{-})$ : $\Gamma^{\textcolor{#1A8FFF}{-}}(U, P) := P_{\textcolor{#1A8FFF}{-}} = e^{-\frac{\varepsilon}{2} s_{P}} \left\{P + \frac{\varepsilon}{2}\left[ F \cdot e^{\varepsilon q_{P}} + t_{P} \right]\right\}$

Plaquette analysis: $x_{P}$

Deviation from $V\rightarrow\infty$ limit, $x_{P}^{\ast}$

Average $\langle x_{P}\rangle$ , with $x_{P}^{\ast}$ (dotted-lines)

Figure 17: Plot showing how **average plaquette**, $\left\langle x_{P}\right\rangle$ varies over a single trajectory for models trained at different $\beta$ , with varying trajectory lengths $N_{\mathrm{LF}}$

Loss Function

Want to maximize the expected squared charge difference¹: $\begin{equation*} \mathcal{L}_{\theta}\left(\xi^{\ast}, \xi\right) = {\mathbb{E}_{p(\xi)}}\big[-\textcolor{#FA5252}{{\delta Q}}^{2} \left(\xi^{\ast}, \xi \right)\cdot A(\xi^{\ast}|\xi)\big] \end{equation*}$
Where:
- $\delta Q$ is the tunneling rate: $\begin{equation*} \textcolor{#FA5252}{\delta Q}(\xi^{\ast},\xi)=\left|Q^{\ast} - Q\right| \end{equation*}$
- $A(\xi^{\ast}|\xi)$ is the probability² of accepting the proposal $\xi^{\ast}$ : $\begin{equation*} A(\xi^{\ast}|\xi) = \mathrm{min}\left( 1, \frac{p(\xi^{\ast})}{p(\xi)}\left|\frac{\partial \xi^{\ast}}{\partial \xi^{T}}\right|\right) \end{equation*}$

Networks 2D $U(1)$

Stack gauge links as shape $\left(U_{\mu}\right)$ =[Nb, 2, Nt, Nx] $\in \mathbb{C}$

$x_{\mu}(n) ≔ \left[\cos(x), \sin(x)\right]$

with shape $\left(x_{\mu}\right)$ = [Nb, 2, Nt, Nx, 2] $\in \mathbb{R}$
$x$ -Network:
- $\psi_{\theta}: (x, v) \longrightarrow \left(s_{x},\, t_{x},\, q_{x}\right)$
$v$ -Network:
- $\varphi_{\theta}: (x, v) \longrightarrow \left(s_{v},\, t_{v},\, q_{v}\right)$ $\hspace{2pt}\longleftarrow$ lets look at this

$v$ -Update¹

forward $(d = \textcolor{#FF5252}{+})$ :

$\Gamma^{\textcolor{#FF5252}{+}}: (x, v) \rightarrow v' := v \cdot e^{\frac{\varepsilon}{2} s_{v}} - \frac{\varepsilon}{2}\left[ F \cdot e^{\varepsilon q_{v}} + t_{v} \right]$

backward $(d = \textcolor{#1A8FFF}{-})$ :

$\Gamma^{\textcolor{#1A8FFF}{-}}: (x, v) \rightarrow v' := e^{-\frac{\varepsilon}{2} s_{v}} \left\{v + \frac{\varepsilon}{2}\left[ F \cdot e^{\varepsilon q_{v}} + t_{v} \right]\right\}$

$x$ -Update

forward $(d = \textcolor{#FF5252}{+})$ :

$\Lambda^{\textcolor{#FF5252}{+}}(x, v) = x \cdot e^{\frac{\varepsilon}{2} s_{x}} - \frac{\varepsilon}{2}\left[ v \cdot e^{\varepsilon q_{x}} + t_{x} \right]$

backward $(d = \textcolor{#1A8FFF}{-})$ :

$\Lambda^{\textcolor{#1A8FFF}{-}}(x, v) = e^{-\frac{\varepsilon}{2} s_{x}} \left\{x + \frac{\varepsilon}{2}\left[ v \cdot e^{\varepsilon q_{x}} + t_{x} \right]\right\}$

Lattice Gauge Theory (2D $U(1)$ )

🔗 Link Variables

$U_{\mu}(n) = e^{i x_{\mu}(n)}\in \mathbb{C},\quad \text{where}\quad$ $x_{\mu}(n) \in [-\pi,\pi)$

🫸 Wilson Action

$S_{\beta}(x) = \beta\sum_{P} \cos \textcolor{#00CCFF}{x_{P}},$

$\textcolor{#00CCFF}{x_{P}} = \left[x_{\mu}(n) + x_{\nu}(n+\hat{\mu}) - x_{\mu}(n+\hat{\nu})-x_{\nu}(n)\right]$

Note: $\textcolor{#00CCFF}{x_{P}}$ is the product of links around $1\times 1$ square, called a “plaquette”

Annealing Schedule

Introduce an annealing schedule during the training phase:

$\left\{ \gamma_{t} \right\}_{t=0}^{N} = \left\{\gamma_{0}, \gamma_{1}, \ldots, \gamma_{N-1}, \gamma_{N} \right\}$

where $\gamma_{0} < \gamma_{1} < \cdots < \gamma_{N} \equiv 1$ , and $\left|\gamma_{t+1} - \gamma_{t}\right| \ll 1$
Note:
- for $\left|\gamma_{t}\right| < 1$ , this rescaling helps to reduce the height of the energy barriers $\Longrightarrow$
- easier for our sampler to explore previously inaccessible regions of the phase space

Networks 2D $U(1)$

Stack gauge links as shape $\left(U_{\mu}\right)$ =[Nb, 2, Nt, Nx] $\in \mathbb{C}$

$x_{\mu}(n) ≔ \left[\cos(x), \sin(x)\right]$

with shape $\left(x_{\mu}\right)$ = [Nb, 2, Nt, Nx, 2] $\in \mathbb{R}$
$x$ -Network:
- $\psi_{\theta}: (x, v) \longrightarrow \left(s_{x},\, t_{x},\, q_{x}\right)$
$v$ -Network:
- $\varphi_{\theta}: (x, v) \longrightarrow \left(s_{v},\, t_{v},\, q_{v}\right)$

Toy Example: GMM $\in \mathbb{R}^{2}$

Figure 19

MLMC: Machine Learning Monte Carlo Sam Foreman [email protected] ALCF 2023-07-31

MLMC: Machine Learning Monte Carlo
MLMC: Machine Learning...
Overview
Markov Chain Monte Carlo (MCMC)
Markov Chain Monte Carlo (MCMC)
Hamiltonian Monte Carlo (HMC)
Hamiltonian Monte Carlo (HMC)
Leapfrog Integrator (HMC)
HMC Update
HMC Demo
Issues with HMC
Topological Freezing
Can we do better?
L2HMC: Generalizing the MD Update
L2HMC: Leapfrog Layer
L2HMC Update
4D SU(3) Model
HMC: 4D SU(3)
HMC: 4D SU(3)
Networks 4D SU(3)
Networks 4D SU(3)
P-Network (pt. 1)
P-Network (pt. 2)
Results: 2D U(1)
Interpretation
Interpretation
4D SU(3) Results
4D SU(3) Results: \delta U_{\mu\nu}
4D SU(3) Results: \delta U_{\mu\nu}
Next Steps
Thank you!
Slide 32
Acknowledgements
Links saforem2/l2hmc-qcd...
References
Extras
Integrated Autocorrelation Time
Comparison
Plaquette analysis: x_{P}
Loss Function
Networks 2D U(1)
v-Update1
x-Update
Lattice Gauge Theory (2D U(1))
Figure 18: Jupyter...
Annealing Schedule
Networks 2D U(1)
Toy Example: GMM \in \mathbb{R}^{2}
Physical Quantities
Extra