MLMC: Machine Learning Monte Carlo for Lattice Gauge Theory

Sam Foreman

MLMC: Machine Learning Monte Carlo

Author

Affiliation

Sam Foreman

ALCF

Published

July 31, 2023

Modified

June 17, 2025

MLMC: Machine Learning Monte Carlo
for Lattice Gauge Theory

Sam Foreman
Xiao-Yong Jin, James C. Osborn
saforem2/{lattice23, l2hmc-qcd}

Overview

Markov Chain Monte Carlo (MCMC)

🎯 Goal

Generate independent samples $\{x_{i}\}$ , such that¹ $\{x_{i}\} \sim p(x) \propto e^{-S(x)}$ where $S(x)$ is the action (or potential energy)

Want to calculate observables $\mathcal{O}$ :
$\left\langle \mathcal{O}\right\rangle \propto \int \left[\mathcal{D}x\right]\hspace{4pt} {\mathcal{O}(x)\, p(x)}$

If these were independent, we could approximate: $\left\langle\mathcal{O}\right\rangle \simeq \frac{1}{N}\sum^{N}_{n=1}\mathcal{O}(x_{n})$
$\sigma_{\mathcal{O}}^{2} = \frac{1}{N}\mathrm{Var}{\left[\mathcal{O} (x) \right]}\Longrightarrow \sigma_{\mathcal{O}} \propto \frac{1}{\sqrt{N}}$

Markov Chain Monte Carlo (MCMC)

🎯 Goal

Generate independent samples $\{x_{i}\}$ , such that² $\{x_{i}\} \sim p(x) \propto e^{-S(x)}$ where $S(x)$ is the action (or potential energy)

Want to calculate observables $\mathcal{O}$ :
$\left\langle \mathcal{O}\right\rangle \propto \int \left[\mathcal{D}x\right]\hspace{4pt} {\mathcal{O}(x)\, p(x)}$

Instead, nearby configs are correlated, and we incur a factor of $\textcolor{#FF5252}{\tau^{\mathcal{O}}_{\mathrm{int}}}$ : $\sigma_{\mathcal{O}}^{2} = \frac{\textcolor{#FF5252}{\tau^{\mathcal{O}}_{\mathrm{int}}}}{N}\mathrm{Var}{\left[\mathcal{O} (x) \right]}$

Hamiltonian Monte Carlo (HMC)

Want to (sequentially) construct a chain of states: $x_{0} \rightarrow x_{1} \rightarrow x_{i} \rightarrow \cdots \rightarrow x_{N}\hspace{10pt}$

such that, as $N \rightarrow \infty$ : $\left\{x_{i}, x_{i+1}, x_{i+2}, \cdots, x_{N} \right\} \xrightarrow[]{N\rightarrow\infty} p(x) \propto e^{-S(x)}$

🪄 Trick

Introduce fictitious momentum $v \sim \mathcal{N}(0, \mathbb{1})$
- Normally distributed independent of $x$ , i.e. $\begin{align*} p(x, v) &\textcolor{#02b875}{=} p(x)\,p(v) \propto e^{-S{(x)}} e^{-\frac{1}{2} v^{T}v} = e^{-\left[S(x) + \frac{1}{2} v^{T}{v}\right]} \textcolor{#02b875}{=} e^{-H(x, v)} \end{align*}$

Hamiltonian Monte Carlo (HMC)

Idea: Evolve the $(\dot{x}, \dot{v})$ system to get new states $\{x_{i}\}$ ❗
Write the joint distribution $p(x, v)$ : $p(x, v) \propto e^{-S[x]} e^{-\frac{1}{2}v^{T} v} = e^{-H(x, v)}$

🔋 Hamiltonian Dynamics

$H = S[x] + \frac{1}{2} v^{T} v \Longrightarrow$ $\dot{x} = +\partial_{v} H, \,\,\dot{v} = -\partial_{x} H$

Leapfrog Integrator (HMC)

🔋 Hamiltonian Dynamics

$\left(\dot{x}, \dot{v}\right) = \left(\partial_{v} H, -\partial_{x} H\right)$

🐸 Leapfrog Step

input $\,\left(x, v\right) \rightarrow \left(x', v'\right)\,$ output

$\begin{align*} \tilde{v} &:= \textcolor{#F06292}{\Gamma}(x, v)\hspace{2.2pt} = v - \frac{\varepsilon}{2} \partial_{x} S(x) \\ x' &:= \textcolor{#FD971F}{\Lambda}(x, \tilde{v}) \, = x + \varepsilon \, \tilde{v} \\ v' &:= \textcolor{#F06292}{\Gamma}(x', \tilde{v}) = \tilde{v} - \frac{\varepsilon}{2} \partial_{x} S(x') \end{align*}$

⚠️ Warning!

Resample $v_{0} \sim \mathcal{N}(0, \mathbb{1})$
at the beginning of each trajectory

Note: $\partial_{x} S(x)$ is the force

HMC Update

We build a trajectory of $N_{\mathrm{LF}}$ leapfrog steps³ $\begin{equation*} (x_{0}, v_{0})% \rightarrow (x_{1}, v_{1})\rightarrow \cdots% \rightarrow (x', v') \end{equation*}$
And propose $x'$ as the next state in our chain

$\begin{align*} \textcolor{#F06292}{\Gamma}: (x, v) \textcolor{#F06292}{\rightarrow} v' &:= v - \frac{\varepsilon}{2} \partial_{x} S(x) \\ \textcolor{#FD971F}{\Lambda}: (x, v) \textcolor{#FD971F}{\rightarrow} x' &:= x + \varepsilon v \end{align*}$

We then accept / reject $x'$ using Metropolis-Hastings criteria,
$A(x'|x) = \min\left\{1, \frac{p(x')}{p(x)}\left|\frac{\partial x'}{\partial x}\right|\right\}$

HMC Demo

Figure 2: HMC Demo

Issues with HMC

What do we want in a good sampler?
- Fast mixing (small autocorrelations)
- Fast burn-in (quick convergence)
Problems with HMC:
- Energy levels selected randomly $\rightarrow$ slow mixing
- Cannot easily traverse low-density zones $\rightarrow$ slow convergence

Topological Freezing

Topological Charge: $Q = \frac{1}{2\pi}\sum_{P}\left\lfloor x_{P}\right\rfloor \in \mathbb{Z}$

note: $\left\lfloor x_{P} \right\rfloor = x_{P} - 2\pi \left\lfloor\frac{x_{P} + \pi}{2\pi}\right\rfloor$

⏳ Critical Slowing Down

$Q$ gets stuck!
- as $\beta\longrightarrow \infty$ :
  - $Q \longrightarrow \text{const.}$
  - $\delta Q = \left(Q^{\ast} - Q\right) \rightarrow 0 \textcolor{#FF5252}{\Longrightarrow}$
- # configs required to estimate errors
  grows exponentially: $\tau_{\mathrm{int}}^{Q} \longrightarrow \infty$

Note \delta Q \rightarrow 0 at increasing \beta — Note $\delta Q \rightarrow 0$ at increasing $\beta$

Can we do better?

Introduce two (invertible NNs) vNet and xNet⁴:
- vNet: $(x, F) \longrightarrow \left(s_{v},\, t_{v},\, q_{v}\right)$
- xNet: $(x, v) \longrightarrow \left(s_{x},\, t_{x},\, q_{x}\right)$

Use these $(s, t, q)$ in the generalized MD update:
- $\Gamma_{\theta}^{\pm}$ $: ({x}, \textcolor{#07B875}{v}) \xrightarrow[]{\textcolor{#F06292}{s_{v}, t_{v}, q_{v}}} (x, \textcolor{#07B875}{v'})$
- $\Lambda_{\theta}^{\pm}$ $: (\textcolor{#AE81FF}{x}, v) \xrightarrow[]{\textcolor{#FD971F}{s_{x}, t_{x}, q_{x}}} (\textcolor{#AE81FF}{x'}, v)$

Figure 4: Generalized MD update where $\Lambda_{\theta}^{\pm}$ , $\Gamma_{\theta}^{\pm}$ are **invertible NNs**

L2HMC: Generalizing the MD Update

Introduce $d \sim \mathcal{U}(\pm)$ to determine the direction of our update
1. $\textcolor{#07B875}{v'} =$ $\Gamma^{\pm}$ $({x}, \textcolor{#07B875}{v})$ $\hspace{46pt}$ update $v$
2. $\textcolor{#AE81FF}{x'} =$ $x_{B}$ $\,+\,$ $\Lambda^{\pm}$ $($ $x_{A}$ $, {v'})$ $\hspace{10pt}$ update first half: $x_{A}$
3. $\textcolor{#AE81FF}{x''} =$ $x'_{A}$ $\,+\,$ $\Lambda^{\pm}$ $($ $x'_{B}$ $, {v'})$ $\hspace{8pt}$ update other half: $x_{B}$
4. $\textcolor{#07B875}{v''} =$ $\Gamma^{\pm}$ $({x''}, \textcolor{#07B875}{v'})$ $\hspace{36pt}$ update $v$

Resample both $v\sim \mathcal{N}(0, 1)$ , and $d \sim \mathcal{U}(\pm)$ at the beginning of each trajectory
- To ensure ergodicity + reversibility, we split the $x$ update into sequential (complementary) updates
Introduce directional variable $d \sim \mathcal{U}(\pm)$ , resampled at the beginning of each trajectory:
- Note that $\left(\Gamma^{+}\right)^{-1} = \Gamma^{-}$ , i.e. $\Gamma^{+}\left[\Gamma^{-}(x, v)\right] = \Gamma^{-}\left[\Gamma^{+}(x, v)\right] = (x, v)$

Figure 5: Generalized MD update with $\Lambda_{\theta}^{\pm}$ , $\Gamma_{\theta}^{\pm}$ **invertible NNs**

L2HMC: Leapfrog Layer

L2HMC Update

👨‍💻 Algorithm

input: $x$
- Resample: $\textcolor{#07B875}{v} \sim \mathcal{N}(0, \mathbb{1})$ ; $\,\,{d\sim\mathcal{U}(\pm)}$
- Construct initial state: $\textcolor{#939393}{\xi} =(\textcolor{#AE81FF}{x}, \textcolor{#07B875}{v}, {\pm})$
forward: Generate proposal $\xi'$ by passing initial $\xi$ through $N_{\mathrm{LF}}$ leapfrog layers
$\textcolor{#939393} \xi \hspace{1pt}\xrightarrow[]{\tiny{\mathrm{LF} \text{ layer}}}\xi_{1} \longrightarrow\cdots \longrightarrow \xi_{N_{\mathrm{LF}}} = \textcolor{#f8f8f8}{\xi'} := (\textcolor{#AE81FF}{x''}, \textcolor{#07B875}{v''})$
- Accept / Reject: $\begin{equation*} A({\textcolor{#f8f8f8}{\xi'}}|{\textcolor{#939393}{\xi}})= \mathrm{min}\left\{1, \frac{\pi(\textcolor{#f8f8f8}{\xi'})}{\pi(\textcolor{#939393}{\xi})} \left| \mathcal{J}\left(\textcolor{#f8f8f8}{\xi'},\textcolor{#939393}{\xi}\right)\right| \right\} \end{equation*}$
backward (if training):
- Evaluate the loss function⁵ $\mathcal{L}\gets \mathcal{L}_{\theta}(\textcolor{#f8f8f8}{\xi'}, \textcolor{#939393}{\xi})$ and backprop
return: $\textcolor{#AE81FF}{x}_{i+1}$
Evaluate MH criteria $(1)$ and return accepted config, $\textcolor{#AE81FF}{{x}_{i+1}}\gets \begin{cases} \textcolor{#f8f8f8}{\textcolor{#AE81FF}{x''}} \small{\text{ w/ prob }} A(\textcolor{#f8f8f8}{\xi''}|\textcolor{#939393}{\xi}) \hspace{26pt} ✅ \\ \textcolor{#939393}{\textcolor{#AE81FF}{x}} \hspace{5pt}\small{\text{ w/ prob }} 1 - A(\textcolor{#f8f8f8}{\xi''}|{\textcolor{#939393}{\xi}}) \hspace{10pt} 🚫 \end{cases}$

Figure 6: **Leapfrog Layer** used in generalized MD update

4D $SU(3)$ Model

🔗 Link Variables

Write link variables $U_{\mu}(x) \in SU(3)$ :

$\begin{align*} U_{\mu}(x) &= \mathrm{exp}\left[{i\, \textcolor{#AE81FF}{\omega^{k}_{\mu}(x)} \lambda^{k}}\right]\\ &= e^{i \textcolor{#AE81FF}{Q}},\quad \text{with} \quad \textcolor{#AE81FF}{Q} \in \mathfrak{su}(3) \end{align*}$

where $\omega^{k}_{\mu}(x)$ $\in \mathbb{R}$ , and $\lambda^{k}$ are the generators of $SU(3)$

🏃‍♂️‍➡️ Conjugate Momenta

Introduce $P_{\mu}(x) = P^{k}_{\mu}(x) \lambda^{k}$ conjugate to $\omega^{k}_{\mu}(x)$

🟥 Wilson Action

$S_{G} = -\frac{\beta}{6} \sum \mathrm{Tr}\left[U_{\mu\nu}(x) + U^{\dagger}_{\mu\nu}(x)\right]$

where $U_{\mu\nu}(x) = U_{\mu}(x) U_{\nu}(x+\hat{\mu}) U^{\dagger}_{\mu}(x+\hat{\nu}) U^{\dagger}_{\nu}(x)$

HMC: 4D $SU(3)$

Hamiltonian: $H[P, U] = \frac{1}{2} P^{2} + S[U] \Longrightarrow$

$U$ update: $\frac{d\omega^{k}}{dt} = \frac{\partial H}{\partial P^{k}}$ $\frac{d\omega^{k}}{dt}\lambda^{k} = P^{k}\lambda^{k} \Longrightarrow \frac{dQ}{dt} = P$ $\begin{align*} Q(\textcolor{#FFEE58}{\varepsilon}) &= Q(0) + \textcolor{#FFEE58}{\varepsilon} P(0)\Longrightarrow\\ -i\, \log U(\textcolor{#FFEE58}{\varepsilon}) &= -i\, \log U(0) + \textcolor{#FFEE58}{\varepsilon} P(0) \\ U(\textcolor{#FFEE58}{\varepsilon}) &= e^{i\,\textcolor{#FFEE58}{\varepsilon} P(0)} U(0)\Longrightarrow \\ &\hspace{1pt}\\ \textcolor{#FD971F}{\Lambda}:\,\, U \longrightarrow U' &:= e^{i\varepsilon P'} U \end{align*}$

$\textcolor{#FFEE58}{\varepsilon}$ is the step size

$P$ update: $\frac{dP^{k}}{dt} = - \frac{\partial H}{\partial \omega^{k}}$ $\frac{dP^{k}}{dt} = - \frac{\partial H}{\partial \omega^{k}} = -\frac{\partial H}{\partial Q} = -\frac{dS}{dQ}\Longrightarrow$ $\begin{align*} P(\textcolor{#FFEE58}{\varepsilon}) &= P(0) - \textcolor{#FFEE58}{\varepsilon} \left.\frac{dS}{dQ}\right|_{t=0} \\ &= P(0) - \textcolor{#FFEE58}{\varepsilon} \,\textcolor{#E599F7}{F[U]} \\ &\hspace{1pt}\\ \textcolor{#F06292}{\Gamma}:\,\, P \longrightarrow P' &:= P - \frac{\varepsilon}{2} F[U] \end{align*}$

$\textcolor{#E599F7}{F[U]}$ is the force term

HMC: 4D $SU(3)$

Momentum Update: $\textcolor{#F06292}{\Gamma}: P \longrightarrow P' := P - \frac{\varepsilon}{2} F[U]$
Link Update: $\textcolor{#FD971F}{\Lambda}: U \longrightarrow U' := e^{i\varepsilon P'} U\quad\quad$
We maintain a batch of Nb lattices, all updated in parallel
- $U$ .dtype = complex128
- $U$ .shape
  = [Nb, 4, Nt, Nx, Ny, Nz, 3, 3]

Networks 4D $SU(3)$

$U$ -Network:

UNet: $(U, P) \longrightarrow \left(s_{U},\, t_{U},\, q_{U}\right)$

$P$ -Network:

PNet: $(U, P) \longrightarrow \left(s_{P},\, t_{P},\, q_{P}\right)$

Networks 4D $SU(3)$

$U$ -Network:

UNet: $(U, P) \longrightarrow \left(s_{U},\, t_{U},\, q_{U}\right)$

$P$ -Network:

PNet: $(U, P) \longrightarrow \left(s_{P},\, t_{P},\, q_{P}\right)$

$\uparrow$
let’s look at this

$P$ -`Network` (pt. 1)

input⁶: $\hspace{7pt}\left(U, F\right) := (e^{i Q}, F)$ $\begin{align*} h_{0} &= \sigma\left( w_{Q} Q + w_{F} F + b \right) \\ h_{1} &= \sigma\left( w_{1} h_{0} + b_{1} \right) \\ &\vdots \\ h_{n} &= \sigma\left(w_{n-1} h_{n-2} + b_{n}\right) \\ \textcolor{#FF5252}{z} & := \sigma\left(w_{n} h_{n-1} + b_{n}\right) \longrightarrow \\ \end{align*}$

output⁷: $\hspace{7pt} (s_{P}, t_{P}, q_{P})$
- $s_{P} = \lambda_{s} \tanh(w_s \textcolor{#FF5252}{z} + b_s)$
- $t_{P} = w_{t} \textcolor{#FF5252}{z} + b_{t}$
- $q_{P} = \lambda_{q} \tanh(w_{q} \textcolor{#FF5252}{z} + b_{q})$

$P$ -`Network` (pt. 2)

Use $(s_{P}, t_{P}, q_{P})$ to update $\Gamma^{\pm}: (U, P) \rightarrow \left(U, P_{\pm}\right)$ ⁸:
- forward $(d = \textcolor{#FF5252}{+})$ : $\Gamma^{\textcolor{#FF5252}{+}}(U, P) := P_{\textcolor{#FF5252}{+}} = P \cdot e^{\frac{\varepsilon}{2} s_{P}} - \frac{\varepsilon}{2}\left[ F \cdot e^{\varepsilon q_{P}} + t_{P} \right]$
- backward $(d = \textcolor{#1A8FFF}{-})$ : $\Gamma^{\textcolor{#1A8FFF}{-}}(U, P) := P_{\textcolor{#1A8FFF}{-}} = e^{-\frac{\varepsilon}{2} s_{P}} \left\{P + \frac{\varepsilon}{2}\left[ F \cdot e^{\varepsilon q_{P}} + t_{P} \right]\right\}$

Results: 2D $U(1)$

📈 Improvement

We can measure the performance by comparing $\tau_{\mathrm{int}}$ for the trained model vs. HMC.

Note: lower is better

Interpretation

Deviation in $x_{P}$

Topological charge mixing

Artificial influx of energy

Figure 8: Illustration of how different observables evolve over a single L2HMC trajectory.

Interpretation

Average plaquette: $\langle x_{P}\rangle$ vs LF step

4D $SU(3)$ Results

Distribution of $\log|\mathcal{J}|$ over all chains, at each leapfrog step, $N_{\mathrm{LF}}$ ( $= 0, 1, \ldots, 8$ ) during training:

4D $SU(3)$ Results: $\delta U_{\mu\nu}$

Figure 13: The difference in the average plaquette $\left|\delta U_{\mu\nu}\right|^{2}$ between the trained model and HMC

4D $SU(3)$ Results: $\delta U_{\mu\nu}$

Figure 14: The difference in the average plaquette $\left|\delta U_{\mu\nu}\right|^{2}$ between the trained model and HMC

Next Steps

Further code development
- saforem2/l2hmc-qcd
Continue to use / test different network architectures
- Gauge equivariant NNs for $U_{\mu}(x)$ update
Continue to test different loss functions for training
Scaling:
- Lattice volume
- Network size
- Batch size
- # of GPUs

Thank you!

🙏 Acknowledgements

This research used resources of the Argonne Leadership Computing Facility,
which is a DOE Office of Science User Facility supported under Contract DE-AC02-06CH11357.

Acknowledgements

Links:
- Link to github
- reach out!
References:

Huge thank you to:
- Yannick Meurice
- Norman Christ
- Akio Tomiya
- Nobuyuki Matsumoto
- Richard Brower
- Luchang Jin
- Chulwoo Jung
- Peter Boyle
- Taku Izubuchi
- Denis Boyda
- Dan Hackett
- ECP-CSD group
- ALCF Staff + Datascience Group

References

(I don’t know why this is broken 🤷🏻‍♂️ )

Boyda, Denis et al. 2022. “Applications of Machine Learning to Lattice Quantum Field Theory.” In Snowmass 2021. https://arxiv.org/abs/2202.05838.

Foreman, Sam, Taku Izubuchi, Luchang Jin, Xiao-Yong Jin, James C. Osborn, and Akio Tomiya. 2022. “HMC with Normalizing Flows.” PoS LATTICE2021: 073. https://doi.org/10.22323/1.396.0073.

Foreman, Sam, Xiao-Yong Jin, and James C. Osborn. 2021. “Deep Learning Hamiltonian Monte Carlo.” In 9th International Conference on Learning Representations. https://arxiv.org/abs/2105.03418.

———. 2022. “LeapfrogLayers: A Trainable Framework for Effective Topological Sampling.” PoS LATTICE2021 (May): 508. https://doi.org/10.22323/1.396.0508.

Shanahan, Phiala et al. 2022. “Snowmass 2021 Computational Frontier CompF03 Topical Group Report: Machine Learning,” September. https://arxiv.org/abs/2209.07559.

Extras

Integrated Autocorrelation Time

Comparison

Plaquette analysis: $x_{P}$

Deviation from $V\rightarrow\infty$ limit, $x_{P}^{\ast}$

Average $\langle x_{P}\rangle$ , with $x_{P}^{\ast}$ (dotted-lines)

Figure 17: Plot showing how **average plaquette**, $\left\langle x_{P}\right\rangle$ varies over a single trajectory for models trained at different $\beta$ , with varying trajectory lengths $N_{\mathrm{LF}}$

Loss Function

Want to maximize the expected squared charge difference⁹: $\begin{equation*} \mathcal{L}_{\theta}\left(\xi^{\ast}, \xi\right) = {\mathbb{E}_{p(\xi)}}\big[-\textcolor{#FA5252}{{\delta Q}}^{2} \left(\xi^{\ast}, \xi \right)\cdot A(\xi^{\ast}|\xi)\big] \end{equation*}$
Where:
- $\delta Q$ is the tunneling rate: $\begin{equation*} \textcolor{#FA5252}{\delta Q}(\xi^{\ast},\xi)=\left|Q^{\ast} - Q\right| \end{equation*}$
- $A(\xi^{\ast}|\xi)$ is the probability¹⁰ of accepting the proposal $\xi^{\ast}$ : $\begin{equation*} A(\xi^{\ast}|\xi) = \mathrm{min}\left( 1, \frac{p(\xi^{\ast})}{p(\xi)}\left|\frac{\partial \xi^{\ast}}{\partial \xi^{T}}\right|\right) \end{equation*}$

Networks 2D $U(1)$

Stack gauge links as shape $\left(U_{\mu}\right)$ =[Nb, 2, Nt, Nx] $\in \mathbb{C}$

$x_{\mu}(n) ≔ \left[\cos(x), \sin(x)\right]$

with shape $\left(x_{\mu}\right)$ = [Nb, 2, Nt, Nx, 2] $\in \mathbb{R}$
$x$ -Network:
- $\psi_{\theta}: (x, v) \longrightarrow \left(s_{x},\, t_{x},\, q_{x}\right)$
$v$ -Network:
- $\varphi_{\theta}: (x, v) \longrightarrow \left(s_{v},\, t_{v},\, q_{v}\right)$ $\hspace{2pt}\longleftarrow$ lets look at this

$v$ -Update¹¹

forward $(d = \textcolor{#FF5252}{+})$ :

$\Gamma^{\textcolor{#FF5252}{+}}: (x, v) \rightarrow v' := v \cdot e^{\frac{\varepsilon}{2} s_{v}} - \frac{\varepsilon}{2}\left[ F \cdot e^{\varepsilon q_{v}} + t_{v} \right]$

backward $(d = \textcolor{#1A8FFF}{-})$ :

$\Gamma^{\textcolor{#1A8FFF}{-}}: (x, v) \rightarrow v' := e^{-\frac{\varepsilon}{2} s_{v}} \left\{v + \frac{\varepsilon}{2}\left[ F \cdot e^{\varepsilon q_{v}} + t_{v} \right]\right\}$

$x$ -Update

forward $(d = \textcolor{#FF5252}{+})$ :

$\Lambda^{\textcolor{#FF5252}{+}}(x, v) = x \cdot e^{\frac{\varepsilon}{2} s_{x}} - \frac{\varepsilon}{2}\left[ v \cdot e^{\varepsilon q_{x}} + t_{x} \right]$

backward $(d = \textcolor{#1A8FFF}{-})$ :

$\Lambda^{\textcolor{#1A8FFF}{-}}(x, v) = e^{-\frac{\varepsilon}{2} s_{x}} \left\{x + \frac{\varepsilon}{2}\left[ v \cdot e^{\varepsilon q_{x}} + t_{x} \right]\right\}$

Lattice Gauge Theory (2D $U(1)$ )

🔗 Link Variables

$U_{\mu}(n) = e^{i x_{\mu}(n)}\in \mathbb{C},\quad \text{where}\quad$ $x_{\mu}(n) \in [-\pi,\pi)$

🫸 Wilson Action

$S_{\beta}(x) = \beta\sum_{P} \cos \textcolor{#00CCFF}{x_{P}},$

$\textcolor{#00CCFF}{x_{P}} = \left[x_{\mu}(n) + x_{\nu}(n+\hat{\mu}) - x_{\mu}(n+\hat{\nu})-x_{\nu}(n)\right]$

Note: $\textcolor{#00CCFF}{x_{P}}$ is the product of links around $1\times 1$ square, called a “plaquette”

Figure 18: Jupyter Notebook

Annealing Schedule

Introduce an annealing schedule during the training phase:

$\left\{ \gamma_{t} \right\}_{t=0}^{N} = \left\{\gamma_{0}, \gamma_{1}, \ldots, \gamma_{N-1}, \gamma_{N} \right\}$

where $\gamma_{0} < \gamma_{1} < \cdots < \gamma_{N} \equiv 1$ , and $\left|\gamma_{t+1} - \gamma_{t}\right| \ll 1$
Note:
- for $\left|\gamma_{t}\right| < 1$ , this rescaling helps to reduce the height of the energy barriers $\Longrightarrow$
- easier for our sampler to explore previously inaccessible regions of the phase space

Networks 2D $U(1)$

Stack gauge links as shape $\left(U_{\mu}\right)$ =[Nb, 2, Nt, Nx] $\in \mathbb{C}$

$x_{\mu}(n) ≔ \left[\cos(x), \sin(x)\right]$

with shape $\left(x_{\mu}\right)$ = [Nb, 2, Nt, Nx, 2] $\in \mathbb{R}$
$x$ -Network:
- $\psi_{\theta}: (x, v) \longrightarrow \left(s_{x},\, t_{x},\, q_{x}\right)$
$v$ -Network:
- $\varphi_{\theta}: (x, v) \longrightarrow \left(s_{v},\, t_{v},\, q_{v}\right)$

Toy Example: GMM $\in \mathbb{R}^{2}$

Physical Quantities

To estimate physical quantities, we:
- Calculate physical observables at increasing spatial resolution
- Perform extrapolation to continuum limit

Extra

Footnotes

Here, $\sim$ means “is distributed according to”↩︎
Here, $\sim$ means “is distributed according to”↩︎
We always start by resampling the momentum, $v_{0} \sim \mathcal{N}(0, \mathbb{1})$ ↩︎
L2HMC: (Foreman, Jin, and Osborn 2021, 2022)↩︎
For simple $\mathbf{x} \in \mathbb{R}^{2}$ example, $\mathcal{L}_{\theta} = A(\xi^{\ast}|\xi)\cdot \left(\mathbf{x}^{\ast} - \mathbf{x}\right)^{2}$ ↩︎
$\sigma(\cdot)$ denotes an activation function↩︎
$\lambda_{s}, \lambda_{q} \in \mathbb{R}$ , trainable parameters↩︎
Note that $\left(\Gamma^{+}\right)^{-1} = \Gamma^{-}$ , i.e. $\Gamma^{+}\left[\Gamma^{-}(U, P)\right] = \Gamma^{-}\left[\Gamma^{+}(U, P)\right] = (U, P)$ ↩︎
Where $\xi^{\ast}$ is the proposed configuration (prior to Accept / Reject)↩︎
And $\left|\frac{\partial \xi^{\ast}}{\partial \xi^{T}}\right|$ is the Jacobian of the transformation from $\xi \rightarrow \xi^{\ast}$ ↩︎
Note that $\left(\Gamma^{+}\right)^{-1} = \Gamma^{-}$ , i.e. $\Gamma^{+}\left[\Gamma^{-}(x, v)\right] = \Gamma^{-}\left[\Gamma^{+}(x, v)\right] = (x, v)$ ↩︎

Citation

For attribution, please cite this work as:

Foreman, Sam. 2023. “MLMC: Machine Learning Monte Carlo for Lattice Gauge Theory.” Https://Indico.fnal.gov/Event/57249/Contributions/271305/. Presentation at the 2023 International Symposium on Lattice Field Theory, July 31. https://saforem2.github.io/lattice23.

--- title: "MLMC: Machine Learning Monte Carlo" location: Lattice 2023 (Fermilab) location-url: "https://indico.fnal.gov/event/57249/contributions/271305/" # location: "[Lattice 2023](https://indico.fnal.gov/event/57249/contributions/271305/)" image: "./assets/thumbnail.png" # https://github.com/saforem2/lattice23/blob/main/assets/thumbnail.png?raw=true" date: 2023-07-31 categories: - ai4science - MCMC - Lattice QCD # date-modified: last-modified title-block-categories: false number-sections: false bibliography: references.bib appendix-cite-as: display favicon: "./assets/favicon.svg" callout-style: simple twitter-card: image: "https://github.com/saforem2/lattice23/blob/main/assets/thumbnail.png?raw=true" site: "saforem2" creator: "saforem2" citation: author: Sam Foreman type: speech genre: "Presentation at the 2023 International Symposium on Lattice Field Theory" container-title: https://indico.fnal.gov/event/57249/contributions/271305/ title: "MLMC: Machine Learning Monte Carlo for Lattice Gauge Theory" url: https://saforem2.github.io/lattice23 abstract: | We present a trainable framework for efficiently generating gauge configurations, and discuss ongoing work in this direction. In particular, we consider the problem of sampling configurations from a 4D 𝑆𝑈(3) lattice gauge theory, and consider a generalized leapfrog integrator in the molecular dynamics update that can be trained to improve sampling efficiency. format: html: # reference-location: section # toc-location: right page-layout: full # grid: # body-width: 800px revealjs: slides-url: https://samforeman.me/talks/lattice23/slides.html # template-partials: # - ./title-slide.html # - ./title-fancy/title-slide.html # - ./title_slide_template.html # - ../../title-slide.html title-slide-attributes: data-background-iframe: https://saforem2.github.io/grid-worms-animation/ data-background-size: contain data-background-color: "#1c1c1c" background-color: "#1c1c1c" title-block-style: none slide-number: c title-slide-style: default chalkboard: buttons: false auto-animate: true reference-location: section touch: true pause: false footnotes-hover: true citations-hover: true preview-links: true controls-tutorial: true controls: false logo: "https://raw.githubusercontent.com/saforem2/anl-job-talk/main/docs/assets/anl.svg" history: false # theme: [css/dark.scss] callout-style: simple # css: [css/default.css, css/callouts.css] fig-align: center css: # - css/default.css - ../../css/custom.css theme: # - light: - white - ../../css/title-slide-template.scss - ../../css/reveal/reveal.scss - ../../css/common.scss - ../../css/dark.scss - ../../css/syntax-dark.scss - ../../css/callout-cards.scss # callout-style: simple # css: # # - css/default.css # - ../../css/custom.css # theme: # # - light: # # - css/dark.scss # - ../../css/title-slide-template.scss # - ../../css/reveal/reveal.scss # - ../../css/common.scss # - ../../css/dark.scss # - ../../css/syntax-dark.scss # - ../../css/callout-cards.scss # css: # # - css/default.css # - ../../css/custom.css # theme: # # - black # # - ./title-fancy/title-slide-template.scss # - ../../css/reveal/reveal.scss # - ../../css/common.scss # - ../../css/dark.scss # - ../../css/syntax-dark.scss # - ../../css/callout-cards.scss # - css/dark.scss self-contained: false embed-resources: false self-contained-math: false center: true highlight-style: "atom-one" default-image-extension: svg code-line-numbers: true data-background-color: "#1c1c1c" background-color: "#1c1c1c" code-overflow: scroll html-math-method: katex output-file: "slides.html" mermaid: theme: neutral # gfm: # output-file: "lattice23.md" --- # {.title-slide background-color="#1c1c1c" background-iframe="https://saforem2.github.io/grid-worms-animation/" loading="lazy"} ::: {style="background-color: rgba(22,22,22,0.75); border-radius: 10px; text-align:center; padding: 0px; padding-left: 1.5em; padding-right: 1.5em; max-width: min-content; min-width: max-content; margin-left: auto; margin-right: auto; padding-top: 0.2em; padding-bottom: 0.2em; line-height: 1.5em!important;"} <span style="color:#939393; font-size:1.5em; font-weight: bold;">MLMC: Machine Learning Monte Carlo</span> <span style="color:#777777; font-size:1.2em; font-weight: bold;">for Lattice Gauge Theory</span> [<br> ]{style="padding-bottom: 0.5rem;"} [{{< fa solid home >}}](https://samforeman.me) Sam Foreman [Xiao-Yong Jin, James C. Osborn]{.dim-text style="font-size:0.8em;"} [[[{{< fa brands github >}} `saforem2/`](https://github.com/saforem2/)]{style="border-bottom: 0.5px solid #00ccff;"}`{`[[`lattice23`](https://github.com/saforem2/lattice23)]{style="border-bottom: 0.5px solid #00ccff;"}, [[`l2hmc-qcd`](https://github.com/saforem2/l2hmc-qcd)]{style="border-bottom: 0.5px solid #00ccff;"}`}`]{style="font-size:0.8em;"} ::: ::: footer [2023-07-31 @ [Lattice 2023](https://indico.fnal.gov/event/57249/contributions/271305/)]{.dim-text style="text-align:left;'} ::: # Overview {background-color="#1c1c1c"} 1. [Background: `{MCMC,HMC}`](#markov-chain-monte-carlo-mcmc-centeredslide) - [Leapfrog Integrator](#leapfrog-integrator-hmc-centeredslide) - [Issues with HMC](#sec-issues-with-hmc) - [Can we do better?](#sec-can-we-do-better) 2. [L2HMC: Generalizing MD](#sec-l2hmc) - [4D $SU(3)$ Model](#sec-su3) - [Results](#sec-results) 3. [References](#sec-references) 4. [Extras](#sec-extras) # Markov Chain Monte Carlo (MCMC) {.centeredslide background-color="#1c1c1c"} :::: {.columns} ::: {.column width="50%"} ::: {.callout-note collapse=false icon=false title="🎯 Goal" style="text-align:left;!important; width: 100%!important;"} Generate **independent** samples $\{x_{i}\}$, such that[^notation1] $$\{x_{i}\} \sim p(x) \propto e^{-S(x)}$$ where $S(x)$ is the _action_ (or potential energy) ::: - Want to calculate observables $\mathcal{O}$: $\left\langle \mathcal{O}\right\rangle \propto \int \left[\mathcal{D}x\right]\hspace{4pt} {\mathcal{O}(x)\, p(x)}$ ::: ::: {.column width="49%"} ![](https://raw.githubusercontent.com/saforem2/deep-fridays/main/assets/normal_distribution.dark.svg) ::: :::: If these were <span style="color:#00CCFF;">independent</span>, we could approximate: $\left\langle\mathcal{O}\right\rangle \simeq \frac{1}{N}\sum^{N}_{n=1}\mathcal{O}(x_{n})$ $$\sigma_{\mathcal{O}}^{2} = \frac{1}{N}\mathrm{Var}{\left[\mathcal{O} (x) \right]}\Longrightarrow \sigma_{\mathcal{O}} \propto \frac{1}{\sqrt{N}}$$ [^notation1]: Here, $\sim$ means "is distributed according to" ::: footer [{{< fa brands github >}} `saforem2/lattice23`](https://saforem2.github.io/lattice23) ::: # Markov Chain Monte Carlo (MCMC) {.centeredslide background-color="#1c1c1c"} :::: {.columns} ::: {.column width="50%"} ::: {.callout-note collapse=false icon=false title="🎯 Goal" style="text-align:left;!important; width: 100%!important;"} Generate **independent** samples $\{x_{i}\}$, such that[^notation2] $$\{x_{i}\} \sim p(x) \propto e^{-S(x)}$$ where $S(x)$ is the _action_ (or potential energy) ::: - Want to calculate observables $\mathcal{O}$: $\left\langle \mathcal{O}\right\rangle \propto \int \left[\mathcal{D}x\right]\hspace{4pt} {\mathcal{O}(x)\, p(x)}$ ::: ::: {.column width="49%"} ![](https://raw.githubusercontent.com/saforem2/deep-fridays/main/assets/normal_distribution.dark.svg) ::: :::: Instead, nearby configs are [correlated]{.red-text}, and we incur a factor of $\textcolor{#FF5252}{\tau^{\mathcal{O}}_{\mathrm{int}}}$: $$\sigma_{\mathcal{O}}^{2} = \frac{\textcolor{#FF5252}{\tau^{\mathcal{O}}_{\mathrm{int}}}}{N}\mathrm{Var}{\left[\mathcal{O} (x) \right]}$$ [^notation2]: Here, $\sim$ means "is distributed according to" ::: footer [{{< fa brands github >}} `saforem2/lattice23`](https://github.com/saforem2/lattice23) ::: # Hamiltonian Monte Carlo (HMC) {.center background-color="#1c1c1c"} - Want to (sequentially) construct a chain of states: $$x_{0} \rightarrow x_{1} \rightarrow x_{i} \rightarrow \cdots \rightarrow x_{N}\hspace{10pt}$$ such that, as $N \rightarrow \infty$: $$\left\{x_{i}, x_{i+1}, x_{i+2}, \cdots, x_{N} \right\} \xrightarrow[]{N\rightarrow\infty} p(x) \propto e^{-S(x)}$$ ::: {.callout-tip icon=false collapse=false title="🪄 Trick" style="display:inline!important; width: 100%!important;"} - Introduce [fictitious]{.green-text} momentum $v \sim \mathcal{N}(0, \mathbb{1})$ - Normally distributed **independent** of $x$, i.e. $$\begin{align*} p(x, v) &\textcolor{#02b875}{=} p(x)\,p(v) \propto e^{-S{(x)}} e^{-\frac{1}{2} v^{T}v} = e^{-\left[S(x) + \frac{1}{2} v^{T}{v}\right]} \textcolor{#02b875}{=} e^{-H(x, v)} \end{align*}$$ ::: ## Hamiltonian Monte Carlo (HMC) {.centeredslide background-color="#1c1c1c"} :::: {.columns} ::: {.column width="55%"} - [**Idea**]{.green-text}: Evolve the $(\dot{x}, \dot{v})$ system to get new states $\{x_{i}\}$❗ - Write the **joint distribution** $p(x, v)$: $$ p(x, v) \propto e^{-S[x]} e^{-\frac{1}{2}v^{T} v} = e^{-H(x, v)} $$ ::: ::: {.column width="45%"} ::: {.callout-tip collapse=false icon=false title="🔋 Hamiltonian Dynamics" style="width:100%!important;"} $H = S[x] + \frac{1}{2} v^{T} v \Longrightarrow$ $$\dot{x} = +\partial_{v} H, \,\,\dot{v} = -\partial_{x} H$$ ::: ::: :::: ::: {#fig-hmc-traj} ![](https://raw.githubusercontent.com/saforem2/deep-fridays/main/assets/hmc1.svg){.r-stretch} Overview of HMC algorithm ::: ## Leapfrog Integrator (HMC) {#sec-leapfrog background-color="#1c1c1c"} :::: {.columns style="font-size: 0.9em;" height="100%"} ::: {.column width="40%"} ::: {.callout-tip collapse=false icon=false title="🔋 Hamiltonian Dynamics" style="width:100%!important;"} $\left(\dot{x}, \dot{v}\right) = \left(\partial_{v} H, -\partial_{x} H\right)$ ::: ::: {.callout-note collapse=false icon=false title="🐸 Leapfrog Step" style="width:100%!important;"} `input` $\,\left(x, v\right) \rightarrow \left(x', v'\right)\,$ `output` $$\begin{align*} \tilde{v} &:= \textcolor{#F06292}{\Gamma}(x, v)\hspace{2.2pt} = v - \frac{\varepsilon}{2} \partial_{x} S(x) \\ x' &:= \textcolor{#FD971F}{\Lambda}(x, \tilde{v}) \, = x + \varepsilon \, \tilde{v} \\ v' &:= \textcolor{#F06292}{\Gamma}(x', \tilde{v}) = \tilde{v} - \frac{\varepsilon}{2} \partial_{x} S(x') \end{align*}$$ ::: ::: {.callout-warning collapse=false icon=false title="⚠️ Warning!" style="width:100%!important;"} - Resample $v_{0} \sim \mathcal{N}(0, \mathbb{1})$ at the [beginning]{.yellow-text} of each trajectory ::: ::: {style="font-size:0.8em; margin-left:13%;"} [**Note**: $\partial_{x} S(x)$ is the _force_]{.dim-text} ::: ::: ::: {.column width="55%" style="text-align:left;"} ![](./assets/hmc1/hmc-update-light.svg){width=60%} ::: :::: ## HMC Update {background-color="#1c1c1c"} :::: {.columns} ::: {.column width="65%"} - We build a trajectory of $N_{\mathrm{LF}}$ **leapfrog steps**[^v0] $$\begin{equation*} (x_{0}, v_{0})% \rightarrow (x_{1}, v_{1})\rightarrow \cdots% \rightarrow (x', v') \end{equation*}$$ - And propose $x'$ as the next state in our chain $$\begin{align*} \textcolor{#F06292}{\Gamma}: (x, v) \textcolor{#F06292}{\rightarrow} v' &:= v - \frac{\varepsilon}{2} \partial_{x} S(x) \\ \textcolor{#FD971F}{\Lambda}: (x, v) \textcolor{#FD971F}{\rightarrow} x' &:= x + \varepsilon v \end{align*}$$ - We then accept / reject $x'$ using Metropolis-Hastings criteria, $A(x'|x) = \min\left\{1, \frac{p(x')}{p(x)}\left|\frac{\partial x'}{\partial x}\right|\right\}$ ::: ::: {.column width="30%"} ![](./assets/hmc1/hmc-update-light.svg) ::: :::: [^v0]: We **always** start by resampling the momentum, $v_{0} \sim \mathcal{N}(0, \mathbb{1})$ ## HMC Demo {.centeredslide background-color="#1c1c1c"} ::: {#fig-hmc-demo} <iframe data-src="https://chi-feng.github.io/mcmc-demo/app.html" width="90%" height="500" title="l2hmc-qcd"></iframe> HMC Demo ::: # Issues with HMC {#sec-issues style="font-size:0.9em;" background-color="#1c1c1c"} - What do we want in a good sampler? - **Fast mixing** (small autocorrelations) - **Fast burn-in** (quick convergence) - Problems with HMC: - Energy levels selected randomly $\rightarrow$ **slow mixing** - Cannot easily traverse low-density zones $\rightarrow$ **slow convergence** ::: {#fig-hmc-issues layout-ncol=2} ![HMC Samples with $\varepsilon=0.25$](https://raw.githubusercontent.com/saforem2/l2hmc-dwq25/main/docs/assets/hmc_traj_eps025.svg) ![HMC Samples with $\varepsilon=0.5$](https://raw.githubusercontent.com/saforem2/l2hmc-dwq25/main/docs/assets/hmc_traj_eps05.svg) HMC Samples generated with varying step sizes $\varepsilon$ ::: # Topological Freezing {.center background-color="#1c1c1c"} :::: {.flex-container style="text-align: center; width: 100%"} ::: {.col1 width="45%" style="text-align: left; font-size: 0.9em;"} **Topological Charge**: $$Q = \frac{1}{2\pi}\sum_{P}\left\lfloor x_{P}\right\rfloor \in \mathbb{Z}$$ [**note:** $\left\lfloor x_{P} \right\rfloor = x_{P} - 2\pi \left\lfloor\frac{x_{P} + \pi}{2\pi}\right\rfloor$]{.dim-text style="font-size:0.8em;"} ::: {.callout-important collapse=false icon=false title="⏳ Critical Slowing Down" style="text-align:left; width:100%!important;"} - $Q$ gets stuck! - as $\beta\longrightarrow \infty$: - $Q \longrightarrow \text{const.}$ - $\delta Q = \left(Q^{\ast} - Q\right) \rightarrow 0 \textcolor{#FF5252}{\Longrightarrow}$ - \# configs required to estimate errors **grows exponentially**: [$\tau_{\mathrm{int}}^{Q} \longrightarrow \infty$]{.red-text} ::: ::: ::: {.flex-container width="45%"} ![Note $\delta Q \rightarrow 0$ at increasing $\beta$](https://raw.githubusercontent.com/saforem2/l2hmc-dwq25/main/docs/assets/critical_slowing_down.svg){width="80%"} ::: :::: # Can we do better? {#sec-can-we-do-better background-color="#1c1c1c"} :::: {.columns} ::: {.column width="50%"} - Introduce two (**invertible NNs**) `vNet` and `xNet`[^l2hmc]: - [`vNet: ` $(x, F) \longrightarrow \left(s_{v},\, t_{v},\, q_{v}\right)$]{style="font-size:0.9em;"} - [`xNet: ` $(x, v) \longrightarrow \left(s_{x},\, t_{x},\, q_{x}\right)$]{style="font-size:0.9em;"}   - Use these $(s, t, q)$ in the _generalized_ MD update: - [[$\Gamma_{\theta}^{\pm}$]{.pink-text} $: ({x}, \textcolor{#07B875}{v}) \xrightarrow[]{\textcolor{#F06292}{s_{v}, t_{v}, q_{v}}} (x, \textcolor{#07B875}{v'})$]{} - [[$\Lambda_{\theta}^{\pm}$]{.orange-text} $: (\textcolor{#AE81FF}{x}, v) \xrightarrow[]{\textcolor{#FD971F}{s_{x}, t_{x}, q_{x}}} (\textcolor{#AE81FF}{x'}, v)$]{} ::: ::: {.column width="48%"} ::: {#fig-mdupdate} ![](./assets/leapfrog-layer-2D-U1-vertical.light.svg){style="width:85%; text-align:center;"} Generalized MD update where [$\Lambda_{\theta}^{\pm}$]{.orange-text}, [$\Gamma_{\theta}^{\pm}$]{.pink-text} are **invertible NNs** ::: ::: :::: [^l2hmc]: [L2HMC: ](https://github.com/saforem2/l2hmc-qcd) {{< fa solid book >}} [@Foreman:2021ixr; @Foreman:2021rhs] # L2HMC: Generalizing the MD Update {#sec-l2hmc .smaller .centeredslide background-color="#1c1c1c"} :::: {.flex-container style="text-align: left; width: 100%"} ::: {.col1 style="width:45%; font-size: 0.8em;"} ::: {style="border:1px solid red;"} - Introduce $d \sim \mathcal{U}(\pm)$ to determine the direction of our update 1. [$\textcolor{#07B875}{v'} =$ [$\Gamma^{\pm}$]{.pink-text}$({x}, \textcolor{#07B875}{v})$]{} [$\hspace{46pt}$ update $v$]{.dim-text style="font-size:0.9em;"} 2. [$\textcolor{#AE81FF}{x'} =$ [$x_{B}$]{.blue-text}$\,+\,$[$\Lambda^{\pm}$]{.orange-text}$($[$x_{A}$]{.red-text}$, {v'})$]{} [$\hspace{10pt}$ update first **half**: $x_{A}$]{.dim-text style="font-size:0.9em;"} 3. [$\textcolor{#AE81FF}{x''} =$ [$x'_{A}$]{.red-text}$\,+\,$[$\Lambda^{\pm}$]{.orange-text}$($[$x'_{B}$]{.blue-text}$, {v'})$]{} [$\hspace{8pt}$ update other half: $x_{B}$]{.dim-text style="font-size:0.9em;"} 4. [$\textcolor{#07B875}{v''} =$ [$\Gamma^{\pm}$]{.pink-text}$({x''}, \textcolor{#07B875}{v'})$]{} [$\hspace{36pt}$ update $v$]{.dim-text style="font-size:0.9em;"} ::: ::: {style="border:1px solid red;"} - Resample both $v\sim \mathcal{N}(0, 1)$, and $d \sim \mathcal{U}(\pm)$ at the beginning of each trajectory - To ensure ergodicity + reversibility, we split the [$x$]{.purple-text} update into sequential (complementary) updates - Introduce directional variable $d \sim \mathcal{U}(\pm)$, resampled at the beginning of each trajectory: - Note that $\left(\Gamma^{+}\right)^{-1} = \Gamma^{-}$, i.e. $$\Gamma^{+}\left[\Gamma^{-}(x, v)\right] = \Gamma^{-}\left[\Gamma^{+}(x, v)\right] = (x, v)$$ ::: ::: ::: {.col2 style="width:55%;"} ::: {#fig-mdupdate} ![](./assets/leapfrog-layer-2D-U1-vertical.light.svg){style="width:85%; text-align:center;"} Generalized MD update with [$\Lambda_{\theta}^{\pm}$]{.orange-text}, [$\Gamma_{\theta}^{\pm}$]{.pink-text} **invertible NNs** ::: ::: :::: ## L2HMC: Leapfrog Layer {.center width="100%" background-color="#1c1c1c"} :::: {.flex-container} ::: {.column style="width: 35%;"} ![](https://raw.githubusercontent.com/saforem2/l2hmc-dwq25/main/docs/assets/drawio/update_steps.svg){.absolute top="30" width="40%"} ::: ::: {.column style="width:65%;"} ![](https://raw.githubusercontent.com/saforem2/l2hmc-dwq25/main/docs/assets/drawio/leapfrog_layer_dark2.svg){width="100%"} ::: ![](https://raw.githubusercontent.com/saforem2/l2hmc-dwq25/main/docs/assets/drawio/network_functions.svg){.absolute top=440 width="100%"} :::: ## L2HMC Update {style="font-size: 0.775em;" background-color="#1c1c1c"} :::: {.columns} ::: {.column width="65%" style="font-size:0.7em;"} ::: {.callout-important collapse=false icon=false title="👨‍💻 Algorithm" style="text-align:left; width: 100%!important;"} 1. `input`: [$x$]{.purple-text} - Resample: $\textcolor{#07B875}{v} \sim \mathcal{N}(0, \mathbb{1})$; $\,\,{d\sim\mathcal{U}(\pm)}$ - Construct initial state: $\textcolor{#939393}{\xi} =(\textcolor{#AE81FF}{x}, \textcolor{#07B875}{v}, {\pm})$ 2. `forward`: Generate [proposal $\xi'$]{style="color:#f8f8f8"} by passing [initial $\xi$]{style="color:#939393"} through $N_{\mathrm{LF}}$ leapfrog layers $$\textcolor{#939393} \xi \hspace{1pt}\xrightarrow[]{\tiny{\mathrm{LF} \text{ layer}}}\xi_{1} \longrightarrow\cdots \longrightarrow \xi_{N_{\mathrm{LF}}} = \textcolor{#f8f8f8}{\xi'} := (\textcolor{#AE81FF}{x''}, \textcolor{#07B875}{v''})$$ - Accept / Reject: $$\begin{equation*} A({\textcolor{#f8f8f8}{\xi'}}|{\textcolor{#939393}{\xi}})= \mathrm{min}\left\{1, \frac{\pi(\textcolor{#f8f8f8}{\xi'})}{\pi(\textcolor{#939393}{\xi})} \left| \mathcal{J}\left(\textcolor{#f8f8f8}{\xi'},\textcolor{#939393}{\xi}\right)\right| \right\} \end{equation*}$$ 5. `backward` (if training): - Evaluate the **loss function**[^loss] $\mathcal{L}\gets \mathcal{L}_{\theta}(\textcolor{#f8f8f8}{\xi'}, \textcolor{#939393}{\xi})$ and backprop 6. `return`: $\textcolor{#AE81FF}{x}_{i+1}$ Evaluate MH criteria $(1)$ and return accepted config, $$\textcolor{#AE81FF}{{x}_{i+1}}\gets \begin{cases} \textcolor{#f8f8f8}{\textcolor{#AE81FF}{x''}} \small{\text{ w/ prob }} A(\textcolor{#f8f8f8}{\xi''}|\textcolor{#939393}{\xi}) \hspace{26pt} ✅ \\ \textcolor{#939393}{\textcolor{#AE81FF}{x}} \hspace{5pt}\small{\text{ w/ prob }} 1 - A(\textcolor{#f8f8f8}{\xi''}|{\textcolor{#939393}{\xi}}) \hspace{10pt} 🚫 \end{cases}$$ ::: ::: ::: {.column width="35%"} ::: {#fig-mdupdate} ![](./assets/leapfrog-layer-2D-U1-vertical.light.svg){style="width:75%; text-align:center;"} **Leapfrog Layer** used in generalized MD update ::: ::: :::: [^loss]: For simple $\mathbf{x} \in \mathbb{R}^{2}$ example, $\mathcal{L}_{\theta} = A(\xi^{\ast}|\xi)\cdot \left(\mathbf{x}^{\ast} - \mathbf{x}\right)^{2}$ # 4D $SU(3)$ Model {#sec-su3 .centeredslide background-color="#1c1c1c"} :::: {.columns} ::: {.column width="50%"} ::: {.callout-note collapse=false icon=false title="🔗 Link Variables" style="text-align:left; width: 100%!important;"} - Write link variables $U_{\mu}(x) \in SU(3)$: $$ \begin{align*} U_{\mu}(x) &= \mathrm{exp}\left[{i\, \textcolor{#AE81FF}{\omega^{k}_{\mu}(x)} \lambda^{k}}\right]\\ &= e^{i \textcolor{#AE81FF}{Q}},\quad \text{with} \quad \textcolor{#AE81FF}{Q} \in \mathfrak{su}(3) \end{align*}$$ [where [$\omega^{k}_{\mu}(x)$]{.purple-text} $\in \mathbb{R}$, and $\lambda^{k}$ are the generators of $SU(3)$]{style="font-size:0.9em;"} ::: ::: {.callout-tip collapse=false icon=false title="🏃‍♂️‍➡️ Conjugate Momenta" style="text-align:left; width:100%!important;"} - Introduce [$P_{\mu}(x) = P^{k}_{\mu}(x) \lambda^{k}$]{.green-text} conjugate to [$\omega^{k}_{\mu}(x)$]{.purple-text} ::: ::: {.callout-important collapse=false icon=false title="🟥 Wilson Action" style="text-align:left; width:100%!important;"} $$ S_{G} = -\frac{\beta}{6} \sum \mathrm{Tr}\left[U_{\mu\nu}(x) + U^{\dagger}_{\mu\nu}(x)\right] $$ where $U_{\mu\nu}(x) = U_{\mu}(x) U_{\nu}(x+\hat{\mu}) U^{\dagger}_{\mu}(x+\hat{\nu}) U^{\dagger}_{\nu}(x)$ ::: ::: ::: {.column width="45%"} ::: {#fig-4dlattice} ![](./assets/u1lattice.dark.svg){width="90%"} Illustration of the lattice ::: ::: :::: ## HMC: 4D $SU(3)$ {#sec-hmcsu3 background-color="#1c1c1c"} Hamiltonian: $H[P, U] = \frac{1}{2} P^{2} + S[U] \Longrightarrow$ :::: {.columns} ::: {.column style="font-size:0.9em; text-align: center;"} ::: {.callout collapse=false style="text-align:left; width: 100%!important;"} - [$U$ update]{style="border-bottom: 2px solid #AE81FF;"}: [$\frac{d\omega^{k}}{dt} = \frac{\partial H}{\partial P^{k}}$]{.purple-text style="font-size:1.5em;"} $$\frac{d\omega^{k}}{dt}\lambda^{k} = P^{k}\lambda^{k} \Longrightarrow \frac{dQ}{dt} = P$$ $$\begin{align*} Q(\textcolor{#FFEE58}{\varepsilon}) &= Q(0) + \textcolor{#FFEE58}{\varepsilon} P(0)\Longrightarrow\\ -i\, \log U(\textcolor{#FFEE58}{\varepsilon}) &= -i\, \log U(0) + \textcolor{#FFEE58}{\varepsilon} P(0) \\ U(\textcolor{#FFEE58}{\varepsilon}) &= e^{i\,\textcolor{#FFEE58}{\varepsilon} P(0)} U(0)\Longrightarrow \\ &\hspace{1pt}\\ \textcolor{#FD971F}{\Lambda}:\,\, U \longrightarrow U' &:= e^{i\varepsilon P'} U \end{align*}$$ ::: ::: aside [$\textcolor{#FFEE58}{\varepsilon}$ is the step size]{.dim-text style="font-size:0.8em;"} ::: ::: ::: {.column style="font-size:0.9em; text-align: center;"} ::: {.callout collapse=false style="text-align:left; width: 100%!important;"} - [$P$ update]{style="border-bottom: 2px solid #07B875;"}: [$\frac{dP^{k}}{dt} = - \frac{\partial H}{\partial \omega^{k}}$]{.green-text style="font-size:1.5em;"} $$\frac{dP^{k}}{dt} = - \frac{\partial H}{\partial \omega^{k}} = -\frac{\partial H}{\partial Q} = -\frac{dS}{dQ}\Longrightarrow$$ $$\begin{align*} P(\textcolor{#FFEE58}{\varepsilon}) &= P(0) - \textcolor{#FFEE58}{\varepsilon} \left.\frac{dS}{dQ}\right|_{t=0} \\ &= P(0) - \textcolor{#FFEE58}{\varepsilon} \,\textcolor{#E599F7}{F[U]} \\ &\hspace{1pt}\\ \textcolor{#F06292}{\Gamma}:\,\, P \longrightarrow P' &:= P - \frac{\varepsilon}{2} F[U] \end{align*}$$ ::: ::: aside [$\textcolor{#E599F7}{F[U]}$ is the force term]{.dim-text style="font-size:0.8em;"} ::: ::: :::: ## HMC: 4D $SU(3)$ {.centeredslide background-color="#1c1c1c"} :::: {.columns} ::: {.column width="47%" style="text-align:left;"} - [Momentum Update]{style="border-bottom: 2px solid #F06292;"}: $$\textcolor{#F06292}{\Gamma}: P \longrightarrow P' := P - \frac{\varepsilon}{2} F[U]$$ - [Link Update]{style="border-bottom: 2px solid #FD971F;"}: $$\textcolor{#FD971F}{\Lambda}: U \longrightarrow U' := e^{i\varepsilon P'} U\quad\quad$$ - We maintain a batch of `Nb` lattices, all updated in parallel - $U$`.dtype = complex128` - $U$`.shape` [`= [Nb, 4, Nt, Nx, Ny, Nz, 3, 3]`]{style="font-size: 0.95em;"} ::: ::: {.column width="47%" style="text-align:right;"} ![](./assets/hmc/hmc-update-light.svg){width=60%} ::: :::: # Networks 4D $SU(3)$ {#sec-su3networks .centeredslide auto-animate="true" background-color="#1c1c1c"} :::: {.columns} ::: {.column width="54%" style="font-size:0.9em;"}  <br>  <br> [$U$]{.purple-text}-Network: [`UNet: ` $(U, P) \longrightarrow \left(s_{U},\, t_{U},\, q_{U}\right)$]{style="font-size:0.9em;"}  <br> ::: {style="border: 1px solid #1c1c1c; border-radius: 6px; padding:1%;"} [$P$]{.green-text}-Network: [`PNet: ` $(U, P) \longrightarrow \left(s_{P},\, t_{P},\, q_{P}\right)$]{style="font-size:0.9em;"} ::: ::: ::: {.column width="45%" style="text-align:right;"} ![](./assets/leapfrog-layer-4D-SU3-vertical.light.svg){width="80%"} ::: :::: # Networks 4D $SU(3)$ {.centeredslide auto-animate="true" background-color="#1c1c1c"} :::: {.columns} ::: {.column width="54%" style="font-size:0.9em;"}  <br>  <br> [$U$]{.purple-text}-Network: [`UNet: ` $(U, P) \longrightarrow \left(s_{U},\, t_{U},\, q_{U}\right)$]{style="font-size:0.9em;"}  <br> ::: {style="border: 1px solid #07B875; border-radius: 6px; padding:1%;"} [$P$]{.green-text}-Network: [`PNet: ` $(U, P) \longrightarrow \left(s_{P},\, t_{P},\, q_{P}\right)$]{style="font-size:0.9em;"} ::: [$\uparrow$]{.dim-text} [let's look at this]{.dim-text style="padding-top: 0.5em!important;"} ::: ::: {.column width="45%" style="text-align:right;"} ![](./assets/leapfrog-layer-4D-SU3-vertical.light.svg){width="80%"} ::: :::: ## $P$-`Network` (pt. 1) {style="font-size:0.95em;" background-color="#1c1c1c"} ::: {style="text-align:center;"} ![](./assets/SU3/PNetwork.light.svg) ::: :::: {.columns} ::: {.column width="50%"} - [`input`[^sigma]: $\hspace{7pt}\left(U, F\right) := (e^{i Q}, F)$]{style="border-bottom: 2px solid rgba(131, 131, 131, 0.493);"} $$\begin{align*} h_{0} &= \sigma\left( w_{Q} Q + w_{F} F + b \right) \\ h_{1} &= \sigma\left( w_{1} h_{0} + b_{1} \right) \\ &\vdots \\ h_{n} &= \sigma\left(w_{n-1} h_{n-2} + b_{n}\right) \\ \textcolor{#FF5252}{z} & := \sigma\left(w_{n} h_{n-1} + b_{n}\right) \longrightarrow \\ \end{align*}$$ ::: ::: {.column width="50%"} - [`output`[^lambda1]: $\hspace{7pt} (s_{P}, t_{P}, q_{P})$]{style="border-bottom: 2px solid rgba(131, 131, 131, 0.5);"} - $s_{P} = \lambda_{s} \tanh(w_s \textcolor{#FF5252}{z} + b_s)$ - $t_{P} = w_{t} \textcolor{#FF5252}{z} + b_{t}$ - $q_{P} = \lambda_{q} \tanh(w_{q} \textcolor{#FF5252}{z} + b_{q})$ ::: :::: [^sigma]: $\sigma(\cdot)$ denotes an activation function [^lambda1]: $\lambda_{s}, \lambda_{q} \in \mathbb{R}$, trainable parameters ## $P$-`Network` (pt. 2) {style="font-size:0.9em;" background-color="#1c1c1c"} ::: {style="text-align:center;"} ![](./assets/SU3/PNetwork.light.svg) ::: - Use $(s_{P}, t_{P}, q_{P})$ to update $\Gamma^{\pm}: (U, P) \rightarrow \left(U, P_{\pm}\right)$[^inverse]: - [forward]{style="color:#FF5252"} $(d = \textcolor{#FF5252}{+})$: $$\Gamma^{\textcolor{#FF5252}{+}}(U, P) := P_{\textcolor{#FF5252}{+}} = P \cdot e^{\frac{\varepsilon}{2} s_{P}} - \frac{\varepsilon}{2}\left[ F \cdot e^{\varepsilon q_{P}} + t_{P} \right]$$ - [backward]{style="color:#1A8FFF;"} $(d = \textcolor{#1A8FFF}{-})$: $$\Gamma^{\textcolor{#1A8FFF}{-}}(U, P) := P_{\textcolor{#1A8FFF}{-}} = e^{-\frac{\varepsilon}{2} s_{P}} \left\{P + \frac{\varepsilon}{2}\left[ F \cdot e^{\varepsilon q_{P}} + t_{P} \right]\right\}$$ [^inverse]: Note that $\left(\Gamma^{+}\right)^{-1} = \Gamma^{-}$, i.e. $\Gamma^{+}\left[\Gamma^{-}(U, P)\right] = \Gamma^{-}\left[\Gamma^{+}(U, P)\right] = (U, P)$ # Results: 2D $U(1)$ {#sec-results .centeredslide background-color="#1c1c1c"} :::: {.columns} ::: {.column width=50% style="align:top;"} ![](https://raw.githubusercontent.com/saforem2/physicsSeminar/main/assets/autocorr_new.svg){width="90%"} ::: ::: {.column width="33%" style="text-align:left; padding-top: 5%;"} ::: {.callout-important icon=false collapse=false title="📈 Improvement" style="text-align:left!important; width: 100%!important;"} We can measure the performance by comparing $\tau_{\mathrm{int}}$ for the [**trained model**]{style="color:#FF2052;"} vs. [**HMC**]{style="color:#9F9F9F;"}. **Note**: [lower]{style="color:#FF2052;"} is better ::: ::: :::: ![](https://raw.githubusercontent.com/saforem2/physicsSeminar/main/assets/charge_histories.svg){.absolute top=400 left=0 width="100%" style="margin-bottom: 1em;margin-top: 1em;"} ## Interpretation {#sec-interpretation .centeredslide background-color="#1c1c1c"} :::: {.columns style="margin-left:1pt;"} ::: {.column width="36%"} [Deviation in $x_{P}$]{.dim-text style="text-align:center; font-size:0.8em"} ::: ::: {.column width="30%"} [Topological charge mixing]{.dim-text style="text-align:right; font-size:0.8em"} ::: ::: {.column width="32%"} [Artificial influx of energy]{.dim-text style="text-align:right!important; font-size:0.8em;"} ::: :::: ::: {#fig-interpretation} ![](https://raw.githubusercontent.com/saforem2/physicsSeminar/main/assets/ridgeplots.svg){width="100%"} Illustration of how different observables evolve over a single L2HMC trajectory. ::: ## Interpretation {.centeredslide background-color="#1c1c1c"} ::: {#fig-energy-ridgeplot layout-ncol=2 layout-valign="top"} ![Average plaquette: $\langle x_{P}\rangle$ vs LF step](https://raw.githubusercontent.com/saforem2/physicsSeminar/main/assets/plaqsf_ridgeplot.svg) ![Average energy: $H - \sum\log|\mathcal{J}|$](https://raw.githubusercontent.com/saforem2/physicsSeminar/main/assets/Hf_ridgeplot.svg) The trained model artifically increases the energy towards the middle of the trajectory, allowing the sampler to tunnel between isolated sectors. ::: # 4D $SU(3)$ Results {#sec-su3results background-color="#1c1c1c"} - Distribution of $\log|\mathcal{J}|$ over all chains, at each _leapfrog step_, $N_{\mathrm{LF}}$ ($= 0, 1, \ldots, 8$) during training: ::: {layout="[ [30, 30, 30] ]" layout-valign="center" style="display: flex; flex-direction: row; margin-top: -0.0em; align-items: center;"} ![`100` train iters](./assets/SU3/logdet_ridgeplot1.svg){#fig-ridgeplot1} ![`500` train iters](./assets/SU3/logdet_ridgeplot2.svg){#fig-ridgeplot2} ![`1000` train iters](./assets/SU3/logdet_ridgeplot3.svg){#fig-ridgeplot3} ::: ## 4D $SU(3)$ Results: $\delta U_{\mu\nu}$ {background-color="#1c1c1c"} ::: {#fig-pdiff} ![](./assets/SU3/pdiff.svg) The difference in the average plaquette $\left|\delta U_{\mu\nu}\right|^{2}$ between the trained model and HMC ::: ## 4D $SU(3)$ Results: $\delta U_{\mu\nu}$ {background-color="#1c1c1c"} ::: {#fig-pdiff-robust} ![](./assets/SU3/pdiff-robust.svg) The difference in the average plaquette $\left|\delta U_{\mu\nu}\right|^{2}$ between the trained model and HMC ::: # Next Steps {#sec-next-steps background-color="#1c1c1c"} - Further code development - {{< fa brands github >}} [`saforem2/l2hmc-qcd`](https://github.com/saforem2/l2hmc-qcd) - Continue to use / test different network architectures - Gauge equivariant NNs for $U_{\mu}(x)$ update - Continue to test different loss functions for training - Scaling: - Lattice volume - Network size - Batch size - \# of GPUs ## Thank you! {#sec-thank-you background-color="#1c1c1c"}  <br> ::: {layout-ncol=4 style="text-align:left; font-size:0.8em;"} [[{{< fa solid home >}} `samforeman.me`](https://samforeman.me)]{style="font-size:0.8em;"} [[{{< fa brands github >}} `saforem2`](https://github.com/saforem2)]{style="font-size:0.8em;"} [[{{< fa brands twitter >}} `@saforem2`](https://www.twitter.com/saforem2)]{style="font-size:0.8em;"} [[{{< fa regular paper-plane >}} `foremans@anl.gov`](mailto:///foremans@anl.gov)]{style="font-size:0.8em;"} ::: ::: {.callout-note icon=false collapse=false title="🙏 Acknowledgements" style="width: 100%!important;"} This research used resources of the Argonne Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC02-06CH11357. ::: ## {#sec-l2hmc-gh background-color="#1c1c1c"} ::: {style="text-align:center;"} [![](https://raw.githubusercontent.com/saforem2/l2hmc-qcd/main/assets/logo-small.svg)](https://github.com/saforem2/l2hmc-qcd) <a href="https://hits.seeyoufarm.com"><img alt="hits" src="https://hits.seeyoufarm.com/api/count/incr/badge.svg?url=https%3A%2F%2Fgithub.com%2Fsaforem2%2Fl2hmc-qcd&count_bg=%2300CCFF&title_bg=%23555555&icon=&icon_color=%23111111&title=👋&edge_flat=false"></a> <a href="https://github.com/saforem2/l2hmc-qcd/"><img alt="l2hmc-qcd" src="https://img.shields.io/badge/-l2hmc--qcd-252525?style=flat&logo=github&labelColor=gray"></a> <a href="https://www.codefactor.io/repository/github/saforem2/l2hmc-qcd"><img alt="codefactor" src="https://www.codefactor.io/repository/github/saforem2/l2hmc-qcd/badge"></a> <a href="https://arxiv.org/abs/2112.01582"><img alt="arxiv" src="http://img.shields.io/badge/arXiv-2112.01582-B31B1B.svg"></a> <a href="https://arxiv.org/abs/2105.03418"><img alt="arxiv" src="http://img.shields.io/badge/arXiv-2105.03418-B31B1B.svg"></a> <a href="https://hydra.cc"><img alt="hydra" src="https://img.shields.io/badge/Config-Hydra-89b8cd"></a> <a href="https://pytorch.org/get-started/locally/"><img alt="pyTorch" src="https://img.shields.io/badge/PyTorch-ee4c2c?logo=pytorch&logoColor=white"></a> <a href="https://www.tensorflow.org"><img alt="tensorflow" src="https://img.shields.io/badge/TensorFlow-%23FF6F00.svg?&logo=TensorFlow&logoColor=white"></a> [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Weights & Biases monitoring" height=20>](https://wandb.ai/l2hmc-qcd/l2hmc-qcd) ::: ## Acknowledgements {#sec-acknowledgements background-color="#1c1c1c"} :::: {.columns} ::: {.column width="50%"} - **Links**: - [{{< fa brands github >}} Link to github](https://github.com/saforem2/l2hmc-qcd) - [{{< fa solid paper-plane >}} reach out!](mailto:foremans@anl.gov) - **References**: - [Link to slides](https://saforem2.github.io/lattice23/) - [{{< fa brands github >}} link to github with slides](https://github.com/saforem2/lattice23) - {{< fa solid book >}} [@Foreman:2021ljl; @Foreman:2021rhs; @Foreman:2021ixr] - {{< fa solid book >}} [@Boyda:2022nmh; @Shanahan:2022ifi] ::: ::: {.column width="50%"} - Huge thank you to: - Yannick Meurice - Norman Christ - Akio Tomiya - Nobuyuki Matsumoto - Richard Brower - Luchang Jin - Chulwoo Jung - Peter Boyle - Taku Izubuchi - Denis Boyda - Dan Hackett - ECP-CSD group - [**ALCF Staff + Datascience Group**]{.red-text} ::: :::: ## {#sec-references background-color="#1c1c1c"} :::: {.columns} ::: {.column width="50%"} ### Links {background-color="#1c1c1c"} - [{{< fa brands github >}} `saforem2/l2hmc-qcd`](https://github.com/saforem2/l2hmc-qcd) - [📊 slides](https://saforem2.github.io/lattice23) (Github: [{{< fa brands github >}} `saforem2/lattice23`](https://github.com/saforem2/lattice23)) ::: ::: {.column width="50%"} ### References {background-color="#1c1c1c"} - [Title Slide Background (worms) animation](https://saforem2.github.io/grid-worms-animation/) - Github: [{{< fa brands github >}} `saforem2/grid-worms-animation`](https://github.com/saforem2/grid-worms-animation) - [Link to HMC demo](https://chi-feng.github.io/mcmc-demo/app.html) ::: :::: ## References {style="line-height:1.2em;" background-color="#1c1c1c"} (I don't know why this is broken 🤷🏻‍♂️ ) ::: {#refs} ::: # Extras {#sec-extras background-color="#1c1c1c"} ## Integrated Autocorrelation Time {.centeredslide background-color="#1c1c1c"} ::: {#fig-iat} ![](https://raw.githubusercontent.com/saforem2/physicsSeminar/main/assets/tint1.svg){width="100%"} Plot of the integrated autocorrelation time for both the trained model (colored) and HMC (greyscale). ::: ## Comparison {background-color="#1c1c1c"} ::: {#fig-comparison layout-ncol=2} ![Trained model](https://saforem2.github.io/anl-job-talk/assets/dQint_eval.svg){#fig-eval} ![Generic HMC](https://saforem2.github.io/anl-job-talk/assets/dQint_hmc.svg){#fig-hmc} Comparison of $\langle \delta Q\rangle = \frac{1}{N}\sum_{i=k}^{N} \delta Q_{i}$ for the trained model [@fig-eval] vs. HMC [@fig-hmc] ::: ## Plaquette analysis: $x_{P}$ {.centeredslide background-color="#1c1c1c"} :::: {.columns} ::: {.column width="55%"} [Deviation from $V\rightarrow\infty$ limit, $x_{P}^{\ast}$]{.dim-text style="text-align:center; font-size:0.9em;"} ::: ::: {.column width="45%"} [Average $\langle x_{P}\rangle$, with $x_{P}^{\ast}$ (dotted-lines)]{.dim-text style="text-align:right!important; font-size:0.9em;"} ::: :::: ::: {#fig-avg-plaq} ![](https://raw.githubusercontent.com/saforem2/physicsSeminar/main/assets/plaqsf_vs_lf_step1.svg){width="100%"} Plot showing how **average plaquette**, $\left\langle x_{P}\right\rangle$ varies over a single trajectory for models trained at different $\beta$, with varying trajectory lengths $N_{\mathrm{LF}}$ ::: ## Loss Function {background-color="#1c1c1c"} - Want to maximize the _expected_ squared charge difference[^charge-diff]: $$\begin{equation*} \mathcal{L}_{\theta}\left(\xi^{\ast}, \xi\right) = {\mathbb{E}_{p(\xi)}}\big[-\textcolor{#FA5252}{{\delta Q}}^{2} \left(\xi^{\ast}, \xi \right)\cdot A(\xi^{\ast}|\xi)\big] \end{equation*}$$ - Where: - $\delta Q$ is the _tunneling rate_: $$\begin{equation*} \textcolor{#FA5252}{\delta Q}(\xi^{\ast},\xi)=\left|Q^{\ast} - Q\right| \end{equation*}$$ - $A(\xi^{\ast}|\xi)$ is the probability[^jacobian] of accepting the proposal $\xi^{\ast}$: $$\begin{equation*} A(\xi^{\ast}|\xi) = \mathrm{min}\left( 1, \frac{p(\xi^{\ast})}{p(\xi)}\left|\frac{\partial \xi^{\ast}}{\partial \xi^{T}}\right|\right) \end{equation*}$$ [^charge-diff]: Where $\xi^{\ast}$ is the _proposed_ configuration (prior to Accept / Reject) [^jacobian]: And $\left|\frac{\partial \xi^{\ast}}{\partial \xi^{T}}\right|$ is the Jacobian of the transformation from $\xi \rightarrow \xi^{\ast}$ ## Networks 2D $U(1)$ {auto-animate=true background-color="#1c1c1c"} - Stack gauge links as `shape`$\left(U_{\mu}\right)$` =[Nb, 2, Nt, Nx]` $\in \mathbb{C}$ $$ x_{\mu}(n) ≔ \left[\cos(x), \sin(x)\right]$$ with `shape`$\left(x_{\mu}\right)$` = [Nb, 2, Nt, Nx, 2]` $\in \mathbb{R}$ - $x$-Network: - [$\psi_{\theta}: (x, v) \longrightarrow \left(s_{x},\, t_{x},\, q_{x}\right)$]{.purple-text} - $v$-Network: - [$\varphi_{\theta}: (x, v) \longrightarrow \left(s_{v},\, t_{v},\, q_{v}\right)$]{.green-text} [$\hspace{2pt}\longleftarrow$ lets look at this]{.dim-text} ## $v$-Update[^reverse] {background-color="#1c1c1c"} - [forward]{style="color:#FF5252"} $(d = \textcolor{#FF5252}{+})$: $$\Gamma^{\textcolor{#FF5252}{+}}: (x, v) \rightarrow v' := v \cdot e^{\frac{\varepsilon}{2} s_{v}} - \frac{\varepsilon}{2}\left[ F \cdot e^{\varepsilon q_{v}} + t_{v} \right]$$ - [backward]{style="color:#1A8FFF;"} $(d = \textcolor{#1A8FFF}{-})$: $$\Gamma^{\textcolor{#1A8FFF}{-}}: (x, v) \rightarrow v' := e^{-\frac{\varepsilon}{2} s_{v}} \left\{v + \frac{\varepsilon}{2}\left[ F \cdot e^{\varepsilon q_{v}} + t_{v} \right]\right\}$$ [^reverse]: [Note that $\left(\Gamma^{+}\right)^{-1} = \Gamma^{-}$, i.e. $\Gamma^{+}\left[\Gamma^{-}(x, v)\right] = \Gamma^{-}\left[\Gamma^{+}(x, v)\right] = (x, v)$]{style="font-size:0.8em;"} ## $x$-Update {background-color="#1c1c1c"} - [forward]{style="color:#FF5252"} $(d = \textcolor{#FF5252}{+})$: $$\Lambda^{\textcolor{#FF5252}{+}}(x, v) = x \cdot e^{\frac{\varepsilon}{2} s_{x}} - \frac{\varepsilon}{2}\left[ v \cdot e^{\varepsilon q_{x}} + t_{x} \right]$$ - [backward]{style="color:#1A8FFF;"} $(d = \textcolor{#1A8FFF}{-})$: $$\Lambda^{\textcolor{#1A8FFF}{-}}(x, v) = e^{-\frac{\varepsilon}{2} s_{x}} \left\{x + \frac{\varepsilon}{2}\left[ v \cdot e^{\varepsilon q_{x}} + t_{x} \right]\right\}$$ ## Lattice Gauge Theory (2D $U(1)$) {.centeredslide background-color="#1c1c1c"} :::: {.columns layout-valign="top"} ::: {.column width="50%"} ::: {style="text-align:center;"} ::: {.callout-note icon=false collapse=false title="🔗 Link Variables" style="width:100%!important; text-align:left;"} $$U_{\mu}(n) = e^{i x_{\mu}(n)}\in \mathbb{C},\quad \text{where}\quad$$ $$x_{\mu}(n) \in [-\pi,\pi)$$ ::: ::: {} ::: {.callout-important icon=false collapse=false title="🫸 Wilson Action" style="width:100%!important; text-align:left;"} $$S_{\beta}(x) = \beta\sum_{P} \cos \textcolor{#00CCFF}{x_{P}},$$ $$\textcolor{#00CCFF}{x_{P}} = \left[x_{\mu}(n) + x_{\nu}(n+\hat{\mu}) - x_{\mu}(n+\hat{\nu})-x_{\nu}(n)\right]$$ ::: [**Note**: $\textcolor{#00CCFF}{x_{P}}$ is the product of links around $1\times 1$ square, called a ["plaquette"]{.blue-text}]{.dim-text style=font-size:0.8em;} ::: ::: ::: ::: {.column width="50%"} ![2D Lattice](https://raw.githubusercontent.com/saforem2/deep-fridays/main/assets/u1lattice.dark.svg){width="80%"} ::: :::: ## {background-color="white"} ::: {#fig-notebook} <iframe data-src="https://nbviewer.org/github/saforem2/l2hmc-qcd/blob/SU3/src/l2hmc/notebooks/l2hmc-2dU1.ipynb" width="100%" height="650" title="l2hmc-qcd"></iframe> Jupyter Notebook ::: ## Annealing Schedule {#sec-annealing-schedule background-color="#1c1c1c"} - Introduce an _annealing schedule_ during the training phase: $$\left\{ \gamma_{t} \right\}_{t=0}^{N} = \left\{\gamma_{0}, \gamma_{1}, \ldots, \gamma_{N-1}, \gamma_{N} \right\}$$ where $\gamma_{0} < \gamma_{1} < \cdots < \gamma_{N} \equiv 1$, and $\left|\gamma_{t+1} - \gamma_{t}\right| \ll 1$ - [**Note**]{.green-text}: - for $\left|\gamma_{t}\right| < 1$, this rescaling helps to reduce the height of the energy barriers $\Longrightarrow$ - easier for our sampler to explore previously inaccessible regions of the phase space ## Networks 2D $U(1)$ {#sec-networks-2dU1 background-color="#1c1c1c"} - Stack gauge links as `shape`$\left(U_{\mu}\right)$` =[Nb, 2, Nt, Nx]` $\in \mathbb{C}$ $$ x_{\mu}(n) ≔ \left[\cos(x), \sin(x)\right]$$ with `shape`$\left(x_{\mu}\right)$` = [Nb, 2, Nt, Nx, 2]` $\in \mathbb{R}$ - $x$-Network: - [$\psi_{\theta}: (x, v) \longrightarrow \left(s_{x},\, t_{x},\, q_{x}\right)$]{.purple-text} - $v$-Network: - [$\varphi_{\theta}: (x, v) \longrightarrow \left(s_{v},\, t_{v},\, q_{v}\right)$]{.green-text} ## Toy Example: GMM $\in \mathbb{R}^{2}$ {.centeredslide background-color="#1c1c1c"} ![](https://raw.githubusercontent.com/saforem2/l2hmc-dwq25/main/docs/assets/iso_gmm_chains.svg){#fig-gmm .r-stretch} ## Physical Quantities {background-color="#1c1c1c"} - To estimate physical quantities, we: - Calculate physical observables at **increasing** spatial resolution - Perform extrapolation to continuum limit ::: {#fig-continuum} ![](https://raw.githubusercontent.com/saforem2/physicsSeminar/main/assets/static/continuum.svg) Increasing the physical resolution ($a \rightarrow 0$) allows us to make predictions about numerical values of physical quantities in the continuum limit. ::: ## Extra {#sec-extra background-color="#1c1c1c"} [![](./assets/thumbnail.png)]{.preview-image style="text-align:center; margin-left:auto; margin-right: auto;"}

Overview

Markov Chain Monte Carlo (MCMC)

Markov Chain Monte Carlo (MCMC)

Hamiltonian Monte Carlo (HMC)

Hamiltonian Monte Carlo (HMC)

Leapfrog Integrator (HMC)

HMC Update

HMC Demo

Issues with HMC

Topological Freezing

Can we do better?

L2HMC: Generalizing the MD Update

L2HMC: Leapfrog Layer

L2HMC Update

4D SU(3)SU(3)SU(3) Model

HMC: 4D SU(3)SU(3)SU(3)

HMC: 4D SU(3)SU(3)SU(3)

Networks 4D SU(3)SU(3)SU(3)

Networks 4D SU(3)SU(3)SU(3)

PPP-Network (pt. 1)

PPP-Network (pt. 2)

Results: 2D U(1)U(1)U(1)

Interpretation

Interpretation

4D SU(3)SU(3)SU(3) Results

4D SU(3)SU(3)SU(3) Results: δUμν\delta U_{\mu\nu}δUμν​

4D SU(3)SU(3)SU(3) Results: δUμν\delta U_{\mu\nu}δUμν​

Next Steps

Thank you!

Acknowledgements

Links

References

References

Extras

Integrated Autocorrelation Time

Comparison

Plaquette analysis: xPx_{P}xP​

Loss Function

Networks 2D U(1)U(1)U(1)

vvv-Update11

xxx-Update

Lattice Gauge Theory (2D U(1)U(1)U(1))

Annealing Schedule

Networks 2D U(1)U(1)U(1)

Toy Example: GMM ∈R2\in \mathbb{R}^{2}∈R2

Physical Quantities

Extra

Footnotes

Citation

4D $SU(3)$ Model

HMC: 4D $SU(3)$

HMC: 4D $SU(3)$

Networks 4D $SU(3)$

Networks 4D $SU(3)$

$P$ -`Network` (pt. 1)

$P$ -`Network` (pt. 2)

Results: 2D $U(1)$

4D $SU(3)$ Results

4D $SU(3)$ Results: $\delta U_{\mu\nu}$

4D $SU(3)$ Results: $\delta U_{\mu\nu}$

Plaquette analysis: $x_{P}$

Networks 2D $U(1)$

$v$ -Update¹¹

$x$ -Update

Lattice Gauge Theory (2D $U(1)$ )

Networks 2D $U(1)$

Toy Example: GMM $\in \mathbb{R}^{2}$