Sam Foreman
  • talks
  • posts
  • projects
    • 📚 All Projects
    • 🍋 ezpz
    • 🟥 l2hmc-qcd
    • 🤖 Megatron-DeepSpeed
    • 💬 wordplay 🎮
    • 🎓 ai-science-training
    • 💸 enrich
    • 🤷🏻‍♂️ambivalent
    • 🌍 climate-analysis
    • 🎨 glitz
    • 🙋🏻personal_site
    • 🗒️ Notes-Demo
  1. 🗂️ Extras
  2. 🎙️ Talks
  3. MLMC: Machine Learning Monte Carlo
  • 📬 Posts
    • 2025
      • 06
        • 🏗️ Building PyTorch 2.8 from Source on Aurora
        • 📰 Nice Headings
        • 🧜‍♀️ Mermaid
      • 05
        • 🚧 Frameworks Issue with numpy > 2
      • 04
        • 🔥 Building PyTorch 2.6 from Source on Aurora
        • 👤 Résumé
      • 03
        • 🚑 Torchtune Patch on Aurora
        • 🪛 Torchtune on Aurora
    • 2024
      • 11
        • 🫥 svgbob
      • 09
        • 🏔️ Spike Skipper
        • 💾 Converting Checkpoints
      • 08
        • 💅 How to Make Dope Slides
        • 📝 ezpz-v1
        • 🍋 ezpz @ ALCF
      • 06
        • 🏎️ Megatron-DeepSpeed on Intel XPU
        • 📸 flash-attn on Sunspot
        • 🎰 Deterministic flash-attn
      • 05
        • 🐛 mpi4py bug on Sunspot
      • 04
        • 🎲 MCMC + Diffusion Sampling
      • 03
        • 🐢 Starting Up Distributed Training on Aurora
      • 02
        • 🏁 l2hmc Example: 2D U(1)
        • 🚂 Loooooooong Sequence Lengths
    • 2023
      • 🔳 l2hmc-qcd Example: 4D SU(3)
      • 🎢 l2hmc-qcd Example: 2D U(1)
  • 🦜 Talks
    • 2025
      • LLMs on Aurora: Overview
      • LLMs on Aurora: Hands-On
      • AuroraGPT: Foundation Models for Science
    • 2024
      • Parallel Training Methods
      • AuroraGPT: ANL’s General Purpose Scientific LLM
      • Deep Learning and Foundation Models at Scale
      • AuroraGPT
      • Training LLMs at Scale
      • LLMs on Polaris
      • Parallel Training Techniques
      • LLMs from Scratch
    • 2023
      • Creating Small(-ish) LLMs
      • Exascale Science on Aurora
      • LLM Lunch Talk
      • Scaling LLMs for Science
      • MLMC: Machine Learning Monte Carlo
      • Generative Modeling and Efficient Sampling
      • Efficient Sampling for LGT
    • 2022
      • Large Scale Training
      • Hyperparameter Management
      • Statistical Learning
      • Scientific Data Science: An Emerging Symbiosis
      • Machine Learning in HEP
    • 2021
      • Accelerated Sampling Methods for LGT
      • Training Topological Samplers for LGT
      • l2hmc-qcd
      • Deep Learning HMC for Improved Gauge Generation
    • 2020
      • Machine Learning for Lattice QCD
  • 🗂️ Extras
    • 📚 Projects
    • 📬 Posts
      • 💅 How to Make Dope Slides
      • 🍋 ezpz @ ALCF
      • 📝 ezpz-v1
      • 👤 Résumé
      • 🫥 svgbob
      • 🪛 Torchtune on Aurora
      • 🚑 Torchtune Patch on Aurora
      • 📆 2025
        • 04
          • 🔥 Building PyTorch 2.6 from Source on Aurora
        • 05
          • 🚧 Frameworks Issue with numpy > 2
        • 06
          • 🏗️ Building PyTorch 2.8 from Source on Aurora
          • 📰 Nice Headings
          • 🧜‍♀️ Mermaid
      • 🤖 AuroraGPT
        • 🏎️ Megatron-DeepSpeed on Intel XPU
        • 💾 Converting Checkpoints
        • 🎰 Deterministic flash-attn
        • 📸 flash-attn on Sunspot
        • 🚂 Loooooooong Sequence Lengths
        • 🐛 mpi4py bug on Sunspot
        • 🏔️ Spike Skipper
        • 🐢 Starting Up Distributed Training on Aurora
      • ⚛️ AI for Physics
        • 🎲 MCMC + Diffusion Sampling
        • 🎢 L2HMC for LQCD
          • 🎢 l2hmc-qcd Example: 2D U(1)
          • 4dSU3nb
            • 🕸️ l2hmc-qcd Example: 4D SU(3)
      • 📗 Jupyter
        • 🏁 l2hmc Example: 2D U(1)
        • L2hmc
          • 🔳 l2hmc-qcd Example: 4D SU(3)
    • 🎙️ Talks
      • Parallel Training Methods
      • Deep Learning and Foundation Models at Scale
      • AuroraGPT: Foundation Models for Science
      • AuroraGPT
      • MLMC: Machine Learning Monte Carlo
      • Training LLMs at Scale
      • LLMs on Polaris
      • Test Rendering on Mobile
      • AuroraGPT
        • AuroraGPT: ANL’s General Purpose Scientific LLM
      • ALCF Incite Hackathon 2025
        • LLMs on Aurora: Overview
        • LLMs on Aurora: Hands-On

On this page

  • Overview
  • Markov Chain Monte Carlo (MCMC)
  • Markov Chain Monte Carlo (MCMC)
  • Hamiltonian Monte Carlo (HMC)
    • Hamiltonian Monte Carlo (HMC)
    • Leapfrog Integrator (HMC)
    • HMC Update
    • HMC Demo
  • Issues with HMC
  • Topological Freezing
  • Can we do better?
  • L2HMC: Generalizing the MD Update
    • L2HMC: Leapfrog Layer
    • L2HMC Update
  • 4D SU(3) Model
    • HMC: 4D SU(3)
    • HMC: 4D SU(3)
  • Networks 4D SU(3)
  • Networks 4D SU(3)
    • P-Network (pt. 1)
    • P-Network (pt. 2)
  • Results: 2D U(1)
    • Interpretation
    • Interpretation
  • 4D SU(3) Results
    • 4D SU(3) Results: \delta U_{\mu\nu}
    • 4D SU(3) Results: \delta U_{\mu\nu}
  • Next Steps
    • Thank you!
    • Acknowledgements
    • References
  • Extras
    • Integrated Autocorrelation Time
    • Comparison
    • Plaquette analysis: x_{P}
    • Loss Function
    • Networks 2D U(1)
    • v-Update
    • x-Update
    • Lattice Gauge Theory (2D U(1))
    • Annealing Schedule
    • Networks 2D U(1)
    • Toy Example: GMM \in \mathbb{R}^{2}
    • Physical Quantities
    • Extra
  • View source
  • Edit this page
  • Report an issue

Other Formats

  • RevealJS
  1. 🗂️ Extras
  2. 🎙️ Talks
  3. MLMC: Machine Learning Monte Carlo

MLMC: Machine Learning Monte Carlo

Author
Affiliation

Sam Foreman

ALCF

Published

July 31, 2023

Modified

June 17, 2025

MLMC: Machine Learning Monte Carlo
for Lattice Gauge Theory

 

Sam Foreman
Xiao-Yong Jin, James C. Osborn
saforem2/{lattice23, l2hmc-qcd}

2023-07-31 @ Lattice 2023

Overview

  1. Background: {MCMC,HMC}
    • Leapfrog Integrator
    • Issues with HMC
    • Can we do better?
  2. L2HMC: Generalizing MD
    • 4D SU(3) Model
    • Results
  3. References
  4. Extras

Markov Chain Monte Carlo (MCMC)

Note🎯 Goal

Generate independent samples \{x_{i}\}, such that1 \{x_{i}\} \sim p(x) \propto e^{-S(x)} where S(x) is the action (or potential energy)

  • Want to calculate observables \mathcal{O}:
    \left\langle \mathcal{O}\right\rangle \propto \int \left[\mathcal{D}x\right]\hspace{4pt} {\mathcal{O}(x)\, p(x)}

If these were independent, we could approximate: \left\langle\mathcal{O}\right\rangle \simeq \frac{1}{N}\sum^{N}_{n=1}\mathcal{O}(x_{n})
\sigma_{\mathcal{O}}^{2} = \frac{1}{N}\mathrm{Var}{\left[\mathcal{O} (x) \right]}\Longrightarrow \sigma_{\mathcal{O}} \propto \frac{1}{\sqrt{N}}

saforem2/lattice23

Markov Chain Monte Carlo (MCMC)

Note🎯 Goal

Generate independent samples \{x_{i}\}, such that2 \{x_{i}\} \sim p(x) \propto e^{-S(x)} where S(x) is the action (or potential energy)

  • Want to calculate observables \mathcal{O}:
    \left\langle \mathcal{O}\right\rangle \propto \int \left[\mathcal{D}x\right]\hspace{4pt} {\mathcal{O}(x)\, p(x)}

Instead, nearby configs are correlated, and we incur a factor of \textcolor{#FF5252}{\tau^{\mathcal{O}}_{\mathrm{int}}}: \sigma_{\mathcal{O}}^{2} = \frac{\textcolor{#FF5252}{\tau^{\mathcal{O}}_{\mathrm{int}}}}{N}\mathrm{Var}{\left[\mathcal{O} (x) \right]}

saforem2/lattice23

Hamiltonian Monte Carlo (HMC)

  • Want to (sequentially) construct a chain of states: x_{0} \rightarrow x_{1} \rightarrow x_{i} \rightarrow \cdots \rightarrow x_{N}\hspace{10pt}

    such that, as N \rightarrow \infty: \left\{x_{i}, x_{i+1}, x_{i+2}, \cdots, x_{N} \right\} \xrightarrow[]{N\rightarrow\infty} p(x) \propto e^{-S(x)}

Tip🪄 Trick
  • Introduce fictitious momentum v \sim \mathcal{N}(0, \mathbb{1})
    • Normally distributed independent of x, i.e. \begin{align*} p(x, v) &\textcolor{#02b875}{=} p(x)\,p(v) \propto e^{-S{(x)}} e^{-\frac{1}{2} v^{T}v} = e^{-\left[S(x) + \frac{1}{2} v^{T}{v}\right]} \textcolor{#02b875}{=} e^{-H(x, v)} \end{align*}

Hamiltonian Monte Carlo (HMC)

  • Idea: Evolve the (\dot{x}, \dot{v}) system to get new states \{x_{i}\}❗

  • Write the joint distribution p(x, v): p(x, v) \propto e^{-S[x]} e^{-\frac{1}{2}v^{T} v} = e^{-H(x, v)}

Tip🔋 Hamiltonian Dynamics

H = S[x] + \frac{1}{2} v^{T} v \Longrightarrow \dot{x} = +\partial_{v} H, \,\,\dot{v} = -\partial_{x} H

Figure 1: Overview of HMC algorithm

Leapfrog Integrator (HMC)

Tip🔋 Hamiltonian Dynamics

\left(\dot{x}, \dot{v}\right) = \left(\partial_{v} H, -\partial_{x} H\right)

Note🐸 Leapfrog Step

input \,\left(x, v\right) \rightarrow \left(x', v'\right)\, output

\begin{align*} \tilde{v} &:= \textcolor{#F06292}{\Gamma}(x, v)\hspace{2.2pt} = v - \frac{\varepsilon}{2} \partial_{x} S(x) \\ x' &:= \textcolor{#FD971F}{\Lambda}(x, \tilde{v}) \, = x + \varepsilon \, \tilde{v} \\ v' &:= \textcolor{#F06292}{\Gamma}(x', \tilde{v}) = \tilde{v} - \frac{\varepsilon}{2} \partial_{x} S(x') \end{align*}

Warning⚠️ Warning!
  • Resample v_{0} \sim \mathcal{N}(0, \mathbb{1})
    at the beginning of each trajectory

Note: \partial_{x} S(x) is the force

HMC Update

  • We build a trajectory of N_{\mathrm{LF}} leapfrog steps3 \begin{equation*} (x_{0}, v_{0})% \rightarrow (x_{1}, v_{1})\rightarrow \cdots% \rightarrow (x', v') \end{equation*}

  • And propose x' as the next state in our chain

\begin{align*} \textcolor{#F06292}{\Gamma}: (x, v) \textcolor{#F06292}{\rightarrow} v' &:= v - \frac{\varepsilon}{2} \partial_{x} S(x) \\ \textcolor{#FD971F}{\Lambda}: (x, v) \textcolor{#FD971F}{\rightarrow} x' &:= x + \varepsilon v \end{align*}

  • We then accept / reject x' using Metropolis-Hastings criteria,
    A(x'|x) = \min\left\{1, \frac{p(x')}{p(x)}\left|\frac{\partial x'}{\partial x}\right|\right\}

HMC Demo

Figure 2: HMC Demo

Issues with HMC

  • What do we want in a good sampler?
    • Fast mixing (small autocorrelations)
    • Fast burn-in (quick convergence)
  • Problems with HMC:
    • Energy levels selected randomly \rightarrow slow mixing
    • Cannot easily traverse low-density zones \rightarrow slow convergence

HMC Samples with \varepsilon=0.25

HMC Samples with \varepsilon=0.5
Figure 3: HMC Samples generated with varying step sizes \varepsilon

Topological Freezing

Topological Charge: Q = \frac{1}{2\pi}\sum_{P}\left\lfloor x_{P}\right\rfloor \in \mathbb{Z}

note: \left\lfloor x_{P} \right\rfloor = x_{P} - 2\pi \left\lfloor\frac{x_{P} + \pi}{2\pi}\right\rfloor

Important⏳ Critical Slowing Down
  • Q gets stuck!
    • as \beta\longrightarrow \infty:
      • Q \longrightarrow \text{const.}
      • \delta Q = \left(Q^{\ast} - Q\right) \rightarrow 0 \textcolor{#FF5252}{\Longrightarrow}
    • # configs required to estimate errors
      grows exponentially: \tau_{\mathrm{int}}^{Q} \longrightarrow \infty

Note \delta Q \rightarrow 0 at increasing \beta

Note \delta Q \rightarrow 0 at increasing \beta

Can we do better?

  • Introduce two (invertible NNs) vNet and xNet4:
    • vNet: (x, F) \longrightarrow \left(s_{v},\, t_{v},\, q_{v}\right)
    • xNet: (x, v) \longrightarrow \left(s_{x},\, t_{x},\, q_{x}\right)

 

  • Use these (s, t, q) in the generalized MD update:
    • \Gamma_{\theta}^{\pm} : ({x}, \textcolor{#07B875}{v}) \xrightarrow[]{\textcolor{#F06292}{s_{v}, t_{v}, q_{v}}} (x, \textcolor{#07B875}{v'})
    • \Lambda_{\theta}^{\pm} : (\textcolor{#AE81FF}{x}, v) \xrightarrow[]{\textcolor{#FD971F}{s_{x}, t_{x}, q_{x}}} (\textcolor{#AE81FF}{x'}, v)
Figure 4: Generalized MD update where \Lambda_{\theta}^{\pm}, \Gamma_{\theta}^{\pm} are invertible NNs

L2HMC: Generalizing the MD Update

  • Introduce d \sim \mathcal{U}(\pm) to determine the direction of our update

    1. \textcolor{#07B875}{v'} = \Gamma^{\pm}({x}, \textcolor{#07B875}{v}) \hspace{46pt} update v

    2. \textcolor{#AE81FF}{x'} = x_{B}\,+\,\Lambda^{\pm}(x_{A}, {v'}) \hspace{10pt} update first half: x_{A}

    3. \textcolor{#AE81FF}{x''} = x'_{A}\,+\,\Lambda^{\pm}(x'_{B}, {v'}) \hspace{8pt} update other half: x_{B}

    4. \textcolor{#07B875}{v''} = \Gamma^{\pm}({x''}, \textcolor{#07B875}{v'}) \hspace{36pt} update v

  • Resample both v\sim \mathcal{N}(0, 1), and d \sim \mathcal{U}(\pm) at the beginning of each trajectory
    • To ensure ergodicity + reversibility, we split the x update into sequential (complementary) updates
  • Introduce directional variable d \sim \mathcal{U}(\pm), resampled at the beginning of each trajectory:
    • Note that \left(\Gamma^{+}\right)^{-1} = \Gamma^{-}, i.e. \Gamma^{+}\left[\Gamma^{-}(x, v)\right] = \Gamma^{-}\left[\Gamma^{+}(x, v)\right] = (x, v)
Figure 5: Generalized MD update with \Lambda_{\theta}^{\pm}, \Gamma_{\theta}^{\pm} invertible NNs

L2HMC: Leapfrog Layer

L2HMC Update

Important👨‍💻 Algorithm
  1. input: x

    • Resample: \textcolor{#07B875}{v} \sim \mathcal{N}(0, \mathbb{1}); \,\,{d\sim\mathcal{U}(\pm)}
    • Construct initial state: \textcolor{#939393}{\xi} =(\textcolor{#AE81FF}{x}, \textcolor{#07B875}{v}, {\pm})
  2. forward: Generate proposal \xi' by passing initial \xi through N_{\mathrm{LF}} leapfrog layers
    \textcolor{#939393} \xi \hspace{1pt}\xrightarrow[]{\tiny{\mathrm{LF} \text{ layer}}}\xi_{1} \longrightarrow\cdots \longrightarrow \xi_{N_{\mathrm{LF}}} = \textcolor{#f8f8f8}{\xi'} := (\textcolor{#AE81FF}{x''}, \textcolor{#07B875}{v''})

    • Accept / Reject: \begin{equation*} A({\textcolor{#f8f8f8}{\xi'}}|{\textcolor{#939393}{\xi}})= \mathrm{min}\left\{1, \frac{\pi(\textcolor{#f8f8f8}{\xi'})}{\pi(\textcolor{#939393}{\xi})} \left| \mathcal{J}\left(\textcolor{#f8f8f8}{\xi'},\textcolor{#939393}{\xi}\right)\right| \right\} \end{equation*}
  3. backward (if training):

    • Evaluate the loss function5 \mathcal{L}\gets \mathcal{L}_{\theta}(\textcolor{#f8f8f8}{\xi'}, \textcolor{#939393}{\xi}) and backprop
  4. return: \textcolor{#AE81FF}{x}_{i+1}
    Evaluate MH criteria (1) and return accepted config, \textcolor{#AE81FF}{{x}_{i+1}}\gets \begin{cases} \textcolor{#f8f8f8}{\textcolor{#AE81FF}{x''}} \small{\text{ w/ prob }} A(\textcolor{#f8f8f8}{\xi''}|\textcolor{#939393}{\xi}) \hspace{26pt} ✅ \\ \textcolor{#939393}{\textcolor{#AE81FF}{x}} \hspace{5pt}\small{\text{ w/ prob }} 1 - A(\textcolor{#f8f8f8}{\xi''}|{\textcolor{#939393}{\xi}}) \hspace{10pt} 🚫 \end{cases}

Figure 6: Leapfrog Layer used in generalized MD update

4D SU(3) Model

Note🔗 Link Variables
  • Write link variables U_{\mu}(x) \in SU(3):

    \begin{align*} U_{\mu}(x) &= \mathrm{exp}\left[{i\, \textcolor{#AE81FF}{\omega^{k}_{\mu}(x)} \lambda^{k}}\right]\\ &= e^{i \textcolor{#AE81FF}{Q}},\quad \text{with} \quad \textcolor{#AE81FF}{Q} \in \mathfrak{su}(3) \end{align*}

    where \omega^{k}_{\mu}(x) \in \mathbb{R}, and \lambda^{k} are the generators of SU(3)

Tip🏃‍♂️‍➡️ Conjugate Momenta
  • Introduce P_{\mu}(x) = P^{k}_{\mu}(x) \lambda^{k} conjugate to \omega^{k}_{\mu}(x)
Important🟥 Wilson Action

S_{G} = -\frac{\beta}{6} \sum \mathrm{Tr}\left[U_{\mu\nu}(x) + U^{\dagger}_{\mu\nu}(x)\right]

where U_{\mu\nu}(x) = U_{\mu}(x) U_{\nu}(x+\hat{\mu}) U^{\dagger}_{\mu}(x+\hat{\nu}) U^{\dagger}_{\nu}(x)

Figure 7: Illustration of the lattice

HMC: 4D SU(3)

Hamiltonian: H[P, U] = \frac{1}{2} P^{2} + S[U] \Longrightarrow

  • U update: \frac{d\omega^{k}}{dt} = \frac{\partial H}{\partial P^{k}} \frac{d\omega^{k}}{dt}\lambda^{k} = P^{k}\lambda^{k} \Longrightarrow \frac{dQ}{dt} = P \begin{align*} Q(\textcolor{#FFEE58}{\varepsilon}) &= Q(0) + \textcolor{#FFEE58}{\varepsilon} P(0)\Longrightarrow\\ -i\, \log U(\textcolor{#FFEE58}{\varepsilon}) &= -i\, \log U(0) + \textcolor{#FFEE58}{\varepsilon} P(0) \\ U(\textcolor{#FFEE58}{\varepsilon}) &= e^{i\,\textcolor{#FFEE58}{\varepsilon} P(0)} U(0)\Longrightarrow \\ &\hspace{1pt}\\ \textcolor{#FD971F}{\Lambda}:\,\, U \longrightarrow U' &:= e^{i\varepsilon P'} U \end{align*}

\textcolor{#FFEE58}{\varepsilon} is the step size

  • P update: \frac{dP^{k}}{dt} = - \frac{\partial H}{\partial \omega^{k}} \frac{dP^{k}}{dt} = - \frac{\partial H}{\partial \omega^{k}} = -\frac{\partial H}{\partial Q} = -\frac{dS}{dQ}\Longrightarrow \begin{align*} P(\textcolor{#FFEE58}{\varepsilon}) &= P(0) - \textcolor{#FFEE58}{\varepsilon} \left.\frac{dS}{dQ}\right|_{t=0} \\ &= P(0) - \textcolor{#FFEE58}{\varepsilon} \,\textcolor{#E599F7}{F[U]} \\ &\hspace{1pt}\\ \textcolor{#F06292}{\Gamma}:\,\, P \longrightarrow P' &:= P - \frac{\varepsilon}{2} F[U] \end{align*}

\textcolor{#E599F7}{F[U]} is the force term

HMC: 4D SU(3)

  • Momentum Update: \textcolor{#F06292}{\Gamma}: P \longrightarrow P' := P - \frac{\varepsilon}{2} F[U]

  • Link Update: \textcolor{#FD971F}{\Lambda}: U \longrightarrow U' := e^{i\varepsilon P'} U\quad\quad

  • We maintain a batch of Nb lattices, all updated in parallel

    • U.dtype = complex128
    • U.shape
      = [Nb, 4, Nt, Nx, Ny, Nz, 3, 3]

Networks 4D SU(3)

 

 

U-Network:

UNet: (U, P) \longrightarrow \left(s_{U},\, t_{U},\, q_{U}\right)

 

P-Network:

PNet: (U, P) \longrightarrow \left(s_{P},\, t_{P},\, q_{P}\right)

Networks 4D SU(3)

 

 

U-Network:

UNet: (U, P) \longrightarrow \left(s_{U},\, t_{U},\, q_{U}\right)

 

P-Network:

PNet: (U, P) \longrightarrow \left(s_{P},\, t_{P},\, q_{P}\right)

\uparrow
let’s look at this

P-Network (pt. 1)

  • input6: \hspace{7pt}\left(U, F\right) := (e^{i Q}, F) \begin{align*} h_{0} &= \sigma\left( w_{Q} Q + w_{F} F + b \right) \\ h_{1} &= \sigma\left( w_{1} h_{0} + b_{1} \right) \\ &\vdots \\ h_{n} &= \sigma\left(w_{n-1} h_{n-2} + b_{n}\right) \\ \textcolor{#FF5252}{z} & := \sigma\left(w_{n} h_{n-1} + b_{n}\right) \longrightarrow \\ \end{align*}
  • output7: \hspace{7pt} (s_{P}, t_{P}, q_{P})

    • s_{P} = \lambda_{s} \tanh(w_s \textcolor{#FF5252}{z} + b_s)
    • t_{P} = w_{t} \textcolor{#FF5252}{z} + b_{t}
    • q_{P} = \lambda_{q} \tanh(w_{q} \textcolor{#FF5252}{z} + b_{q})

P-Network (pt. 2)

  • Use (s_{P}, t_{P}, q_{P}) to update \Gamma^{\pm}: (U, P) \rightarrow \left(U, P_{\pm}\right)8:

    • forward (d = \textcolor{#FF5252}{+}): \Gamma^{\textcolor{#FF5252}{+}}(U, P) := P_{\textcolor{#FF5252}{+}} = P \cdot e^{\frac{\varepsilon}{2} s_{P}} - \frac{\varepsilon}{2}\left[ F \cdot e^{\varepsilon q_{P}} + t_{P} \right]

    • backward (d = \textcolor{#1A8FFF}{-}): \Gamma^{\textcolor{#1A8FFF}{-}}(U, P) := P_{\textcolor{#1A8FFF}{-}} = e^{-\frac{\varepsilon}{2} s_{P}} \left\{P + \frac{\varepsilon}{2}\left[ F \cdot e^{\varepsilon q_{P}} + t_{P} \right]\right\}

Results: 2D U(1)

Important📈 Improvement

We can measure the performance by comparing \tau_{\mathrm{int}} for the trained model vs. HMC.

Note: lower is better

Interpretation

Deviation in x_{P}

Topological charge mixing

Artificial influx of energy

Figure 8: Illustration of how different observables evolve over a single L2HMC trajectory.

Interpretation

Average plaquette: \langle x_{P}\rangle vs LF step

Average energy: H - \sum\log|\mathcal{J}|
Figure 9: The trained model artifically increases the energy towards the middle of the trajectory, allowing the sampler to tunnel between isolated sectors.

4D SU(3) Results

  • Distribution of \log|\mathcal{J}| over all chains, at each leapfrog step, N_{\mathrm{LF}} (= 0, 1, \ldots, 8) during training:
Figure 10: 100 train iters
Figure 11: 500 train iters
Figure 12: 1000 train iters

4D SU(3) Results: \delta U_{\mu\nu}

Figure 13: The difference in the average plaquette \left|\delta U_{\mu\nu}\right|^{2} between the trained model and HMC

4D SU(3) Results: \delta U_{\mu\nu}

Figure 14: The difference in the average plaquette \left|\delta U_{\mu\nu}\right|^{2} between the trained model and HMC

Next Steps

  • Further code development

    • saforem2/l2hmc-qcd
  • Continue to use / test different network architectures

    • Gauge equivariant NNs for U_{\mu}(x) update
  • Continue to test different loss functions for training

  • Scaling:

    • Lattice volume
    • Network size
    • Batch size
    • # of GPUs

Thank you!

 

samforeman.me

saforem2

@saforem2

[email protected]

Note🙏 Acknowledgements

This research used resources of the Argonne Leadership Computing Facility,
which is a DOE Office of Science User Facility supported under Contract DE-AC02-06CH11357.

hits l2hmc-qcd codefactor

arxiv arxiv

hydra pyTorch tensorflow Weights & Biases monitoring

Acknowledgements

  • Links:
    • Link to github
    • reach out!
  • References:
    • Link to slides
      • link to github with slides
    • (Foreman et al. 2022; Foreman, Jin, and Osborn 2022, 2021)
    • (Boyda et al. 2022; Shanahan et al. 2022)
  • Huge thank you to:
    • Yannick Meurice
    • Norman Christ
    • Akio Tomiya
    • Nobuyuki Matsumoto
    • Richard Brower
    • Luchang Jin
    • Chulwoo Jung
    • Peter Boyle
    • Taku Izubuchi
    • Denis Boyda
    • Dan Hackett
    • ECP-CSD group
    • ALCF Staff + Datascience Group

Links

  • saforem2/l2hmc-qcd

  • 📊 slides (Github: saforem2/lattice23)

References

  • Title Slide Background (worms) animation
    • Github: saforem2/grid-worms-animation
  • Link to HMC demo

References

(I don’t know why this is broken 🤷🏻‍♂️ )

Boyda, Denis et al. 2022. “Applications of Machine Learning to Lattice Quantum Field Theory.” In Snowmass 2021. https://arxiv.org/abs/2202.05838.
Foreman, Sam, Taku Izubuchi, Luchang Jin, Xiao-Yong Jin, James C. Osborn, and Akio Tomiya. 2022. “HMC with Normalizing Flows.” PoS LATTICE2021: 073. https://doi.org/10.22323/1.396.0073.
Foreman, Sam, Xiao-Yong Jin, and James C. Osborn. 2021. “Deep Learning Hamiltonian Monte Carlo.” In 9th International Conference on Learning Representations. https://arxiv.org/abs/2105.03418.
———. 2022. “LeapfrogLayers: A Trainable Framework for Effective Topological Sampling.” PoS LATTICE2021 (May): 508. https://doi.org/10.22323/1.396.0508.
Shanahan, Phiala et al. 2022. “Snowmass 2021 Computational Frontier CompF03 Topical Group Report: Machine Learning,” September. https://arxiv.org/abs/2209.07559.

Extras

Integrated Autocorrelation Time

Figure 15: Plot of the integrated autocorrelation time for both the trained model (colored) and HMC (greyscale).

Comparison

(a) Trained model
(b) Generic HMC
Figure 16: Comparison of \langle \delta Q\rangle = \frac{1}{N}\sum_{i=k}^{N} \delta Q_{i} for the trained model Figure 16 (a) vs. HMC Figure 16 (b)

Plaquette analysis: x_{P}

Deviation from V\rightarrow\infty limit, x_{P}^{\ast}

Average \langle x_{P}\rangle, with x_{P}^{\ast} (dotted-lines)

Figure 17: Plot showing how average plaquette, \left\langle x_{P}\right\rangle varies over a single trajectory for models trained at different \beta, with varying trajectory lengths N_{\mathrm{LF}}

Loss Function

  • Want to maximize the expected squared charge difference9: \begin{equation*} \mathcal{L}_{\theta}\left(\xi^{\ast}, \xi\right) = {\mathbb{E}_{p(\xi)}}\big[-\textcolor{#FA5252}{{\delta Q}}^{2} \left(\xi^{\ast}, \xi \right)\cdot A(\xi^{\ast}|\xi)\big] \end{equation*}

  • Where:

    • \delta Q is the tunneling rate: \begin{equation*} \textcolor{#FA5252}{\delta Q}(\xi^{\ast},\xi)=\left|Q^{\ast} - Q\right| \end{equation*}

    • A(\xi^{\ast}|\xi) is the probability10 of accepting the proposal \xi^{\ast}: \begin{equation*} A(\xi^{\ast}|\xi) = \mathrm{min}\left( 1, \frac{p(\xi^{\ast})}{p(\xi)}\left|\frac{\partial \xi^{\ast}}{\partial \xi^{T}}\right|\right) \end{equation*}

Networks 2D U(1)

  • Stack gauge links as shape\left(U_{\mu}\right)=[Nb, 2, Nt, Nx] \in \mathbb{C}

    x_{\mu}(n) ≔ \left[\cos(x), \sin(x)\right]

    with shape\left(x_{\mu}\right)= [Nb, 2, Nt, Nx, 2] \in \mathbb{R}

  • x-Network:

    • \psi_{\theta}: (x, v) \longrightarrow \left(s_{x},\, t_{x},\, q_{x}\right)
  • v-Network:

    • \varphi_{\theta}: (x, v) \longrightarrow \left(s_{v},\, t_{v},\, q_{v}\right) \hspace{2pt}\longleftarrow lets look at this

v-Update11

  • forward (d = \textcolor{#FF5252}{+}):

\Gamma^{\textcolor{#FF5252}{+}}: (x, v) \rightarrow v' := v \cdot e^{\frac{\varepsilon}{2} s_{v}} - \frac{\varepsilon}{2}\left[ F \cdot e^{\varepsilon q_{v}} + t_{v} \right]

  • backward (d = \textcolor{#1A8FFF}{-}):

\Gamma^{\textcolor{#1A8FFF}{-}}: (x, v) \rightarrow v' := e^{-\frac{\varepsilon}{2} s_{v}} \left\{v + \frac{\varepsilon}{2}\left[ F \cdot e^{\varepsilon q_{v}} + t_{v} \right]\right\}

x-Update

  • forward (d = \textcolor{#FF5252}{+}):

\Lambda^{\textcolor{#FF5252}{+}}(x, v) = x \cdot e^{\frac{\varepsilon}{2} s_{x}} - \frac{\varepsilon}{2}\left[ v \cdot e^{\varepsilon q_{x}} + t_{x} \right]

  • backward (d = \textcolor{#1A8FFF}{-}):

\Lambda^{\textcolor{#1A8FFF}{-}}(x, v) = e^{-\frac{\varepsilon}{2} s_{x}} \left\{x + \frac{\varepsilon}{2}\left[ v \cdot e^{\varepsilon q_{x}} + t_{x} \right]\right\}

Lattice Gauge Theory (2D U(1))

Note🔗 Link Variables

U_{\mu}(n) = e^{i x_{\mu}(n)}\in \mathbb{C},\quad \text{where}\quad x_{\mu}(n) \in [-\pi,\pi)

Important🫸 Wilson Action

S_{\beta}(x) = \beta\sum_{P} \cos \textcolor{#00CCFF}{x_{P}},

\textcolor{#00CCFF}{x_{P}} = \left[x_{\mu}(n) + x_{\nu}(n+\hat{\mu}) - x_{\mu}(n+\hat{\nu})-x_{\nu}(n)\right]

Note: \textcolor{#00CCFF}{x_{P}} is the product of links around 1\times 1 square, called a “plaquette”

2D Lattice

2D Lattice

Figure 18: Jupyter Notebook

Annealing Schedule

  • Introduce an annealing schedule during the training phase:

    \left\{ \gamma_{t} \right\}_{t=0}^{N} = \left\{\gamma_{0}, \gamma_{1}, \ldots, \gamma_{N-1}, \gamma_{N} \right\}

    where \gamma_{0} < \gamma_{1} < \cdots < \gamma_{N} \equiv 1, and \left|\gamma_{t+1} - \gamma_{t}\right| \ll 1

  • Note:

    • for \left|\gamma_{t}\right| < 1, this rescaling helps to reduce the height of the energy barriers \Longrightarrow
    • easier for our sampler to explore previously inaccessible regions of the phase space

Networks 2D U(1)

  • Stack gauge links as shape\left(U_{\mu}\right)=[Nb, 2, Nt, Nx] \in \mathbb{C}

    x_{\mu}(n) ≔ \left[\cos(x), \sin(x)\right]

    with shape\left(x_{\mu}\right)= [Nb, 2, Nt, Nx, 2] \in \mathbb{R}

  • x-Network:

    • \psi_{\theta}: (x, v) \longrightarrow \left(s_{x},\, t_{x},\, q_{x}\right)
  • v-Network:

    • \varphi_{\theta}: (x, v) \longrightarrow \left(s_{v},\, t_{v},\, q_{v}\right)

Toy Example: GMM \in \mathbb{R}^{2}

Figure 19

Physical Quantities

  • To estimate physical quantities, we:
    • Calculate physical observables at increasing spatial resolution
    • Perform extrapolation to continuum limit
Figure 20: Increasing the physical resolution (a \rightarrow 0) allows us to make predictions about numerical values of physical quantities in the continuum limit.

Extra

Footnotes

  1. Here, \sim means “is distributed according to”↩︎

  2. Here, \sim means “is distributed according to”↩︎

  3. We always start by resampling the momentum, v_{0} \sim \mathcal{N}(0, \mathbb{1})↩︎

  4. L2HMC: (Foreman, Jin, and Osborn 2021, 2022)↩︎

  5. For simple \mathbf{x} \in \mathbb{R}^{2} example, \mathcal{L}_{\theta} = A(\xi^{\ast}|\xi)\cdot \left(\mathbf{x}^{\ast} - \mathbf{x}\right)^{2}↩︎

  6. \sigma(\cdot) denotes an activation function↩︎

  7. \lambda_{s}, \lambda_{q} \in \mathbb{R}, trainable parameters↩︎

  8. Note that \left(\Gamma^{+}\right)^{-1} = \Gamma^{-}, i.e. \Gamma^{+}\left[\Gamma^{-}(U, P)\right] = \Gamma^{-}\left[\Gamma^{+}(U, P)\right] = (U, P)↩︎

  9. Where \xi^{\ast} is the proposed configuration (prior to Accept / Reject)↩︎

  10. And \left|\frac{\partial \xi^{\ast}}{\partial \xi^{T}}\right| is the Jacobian of the transformation from \xi \rightarrow \xi^{\ast}↩︎

  11. Note that \left(\Gamma^{+}\right)^{-1} = \Gamma^{-}, i.e. \Gamma^{+}\left[\Gamma^{-}(x, v)\right] = \Gamma^{-}\left[\Gamma^{+}(x, v)\right] = (x, v)↩︎

Citation

For attribution, please cite this work as:
Foreman, Sam. 2023. “MLMC: Machine Learning Monte Carlo for Lattice Gauge Theory.” Https://Indico.fnal.gov/Event/57249/Contributions/271305/. Presentation at the 2023 International Symposium on Lattice Field Theory, July 31. https://saforem2.github.io/lattice23.
🕸️ l2hmc-qcd Example: 4D SU(3)
Test Rendering on Mobile
Source Code
---
title: "MLMC: Machine Learning Monte Carlo"
location: Lattice 2023 (Fermilab)
location-url: "https://indico.fnal.gov/event/57249/contributions/271305/"
# location: "[Lattice 2023](https://indico.fnal.gov/event/57249/contributions/271305/)"
image: "./assets/thumbnail.png"
# https://github.com/saforem2/lattice23/blob/main/assets/thumbnail.png?raw=true"
date: 2023-07-31
categories:
    - ai4science
    - MCMC
    - Lattice QCD
# date-modified: last-modified
title-block-categories: false
number-sections: false
bibliography: references.bib
appendix-cite-as: display
favicon: "./assets/favicon.svg"
callout-style: simple
twitter-card:
  image: "https://github.com/saforem2/lattice23/blob/main/assets/thumbnail.png?raw=true"
  site: "saforem2"
  creator: "saforem2"
citation:
   author: Sam Foreman
   type: speech
   genre: "Presentation at the 2023 International Symposium on Lattice Field Theory"
   container-title: https://indico.fnal.gov/event/57249/contributions/271305/
   title: "MLMC: Machine Learning Monte Carlo for Lattice Gauge Theory"
   url: https://saforem2.github.io/lattice23
   abstract: | 
     We present a trainable framework for efficiently generating gauge
     configurations, and discuss ongoing work in this direction. In particular, we
     consider the problem of sampling configurations from a 4D 𝑆𝑈(3) lattice gauge
     theory, and consider a generalized leapfrog integrator in the molecular
     dynamics update that can be trained to improve sampling efficiency.
format:
  html:
    # reference-location: section
    # toc-location: right
    page-layout: full
    # grid:
    #   body-width: 800px
  revealjs:
    slides-url: https://samforeman.me/talks/lattice23/slides.html
    # template-partials:
    #   - ./title-slide.html
      # - ./title-fancy/title-slide.html
      # - ./title_slide_template.html
      # - ../../title-slide.html
    title-slide-attributes:
      data-background-iframe: https://saforem2.github.io/grid-worms-animation/
      data-background-size: contain
      data-background-color: "#1c1c1c"
      background-color: "#1c1c1c"
    title-block-style: none
    slide-number: c
    title-slide-style: default
    chalkboard:
      buttons: false
    auto-animate: true
    reference-location: section
    touch: true
    pause: false
    footnotes-hover: true
    citations-hover: true
    preview-links: true
    controls-tutorial: true
    controls: false
    logo: "https://raw.githubusercontent.com/saforem2/anl-job-talk/main/docs/assets/anl.svg"
    history: false
    # theme: [css/dark.scss]
    callout-style: simple
    # css: [css/default.css, css/callouts.css]
    fig-align: center
    css:
      # - css/default.css
      - ../../css/custom.css
    theme:
      # - light:
      - white
      - ../../css/title-slide-template.scss
      - ../../css/reveal/reveal.scss
      - ../../css/common.scss
      - ../../css/dark.scss
      - ../../css/syntax-dark.scss
      - ../../css/callout-cards.scss
    # callout-style: simple
    # css:
    #   # - css/default.css
    #   - ../../css/custom.css
    # theme:
    #   # - light:
    #   # - css/dark.scss
    #   - ../../css/title-slide-template.scss
    #   - ../../css/reveal/reveal.scss
    #   - ../../css/common.scss
    #   - ../../css/dark.scss
    #   - ../../css/syntax-dark.scss
    #   - ../../css/callout-cards.scss
    # css:
    #   # - css/default.css
    #   - ../../css/custom.css
    # theme:
    #   # - black
    #   # - ./title-fancy/title-slide-template.scss
    #   - ../../css/reveal/reveal.scss
    #   - ../../css/common.scss
    #   - ../../css/dark.scss
    #   - ../../css/syntax-dark.scss
    #   - ../../css/callout-cards.scss
    #   - css/dark.scss
    self-contained: false
    embed-resources: false
    self-contained-math: false
    center: true
    highlight-style: "atom-one"
    default-image-extension: svg
    code-line-numbers: true
    data-background-color: "#1c1c1c"
    background-color: "#1c1c1c"
    code-overflow: scroll
    html-math-method: katex
    output-file: "slides.html"
    mermaid:
      theme: neutral
  # gfm:
  #   output-file: "lattice23.md"
---

# {.title-slide background-color="#1c1c1c" background-iframe="https://saforem2.github.io/grid-worms-animation/" loading="lazy"}

::: {style="background-color: rgba(22,22,22,0.75); border-radius: 10px; text-align:center; padding: 0px; padding-left: 1.5em; padding-right: 1.5em; max-width: min-content; min-width: max-content; margin-left: auto; margin-right: auto; padding-top: 0.2em; padding-bottom: 0.2em; line-height: 1.5em!important;"}
<span style="color:#939393; font-size:1.5em; font-weight: bold;">MLMC: Machine Learning Monte Carlo</span>  
<span style="color:#777777; font-size:1.2em; font-weight: bold;">for Lattice Gauge Theory</span>  
[<br>&nbsp;]{style="padding-bottom: 0.5rem;"}  
[{{< fa solid home >}}](https://samforeman.me) Sam Foreman  
[Xiao-Yong Jin, James C. Osborn]{.dim-text style="font-size:0.8em;"}  
[[[{{< fa brands github >}} `saforem2/`](https://github.com/saforem2/)]{style="border-bottom: 0.5px solid #00ccff;"}`{`[[`lattice23`](https://github.com/saforem2/lattice23)]{style="border-bottom: 0.5px solid #00ccff;"}, [[`l2hmc-qcd`](https://github.com/saforem2/l2hmc-qcd)]{style="border-bottom: 0.5px solid #00ccff;"}`}`]{style="font-size:0.8em;"}
:::

::: footer
[2023-07-31 @ [Lattice 2023](https://indico.fnal.gov/event/57249/contributions/271305/)]{.dim-text style="text-align:left;'}
:::

# Overview {background-color="#1c1c1c"}

1. [Background: `{MCMC,HMC}`](#markov-chain-monte-carlo-mcmc-centeredslide)
    - [Leapfrog Integrator](#leapfrog-integrator-hmc-centeredslide)
    - [Issues with HMC](#sec-issues-with-hmc)
    - [Can we do better?](#sec-can-we-do-better)
2. [L2HMC: Generalizing MD](#sec-l2hmc)
    - [4D $SU(3)$ Model](#sec-su3)
    - [Results](#sec-results)
3. [References](#sec-references)
4. [Extras](#sec-extras)


# Markov Chain Monte Carlo (MCMC) {.centeredslide background-color="#1c1c1c"}

:::: {.columns}

::: {.column width="50%"}

::: {.callout-note collapse=false icon=false title="🎯 Goal" style="text-align:left;!important; width: 100%!important;"}
Generate **independent** samples $\{x_{i}\}$, such that[^notation1]
$$\{x_{i}\} \sim p(x) \propto e^{-S(x)}$$
where $S(x)$ is the _action_ (or potential energy)
:::

- Want to calculate observables $\mathcal{O}$:  
  $\left\langle \mathcal{O}\right\rangle \propto \int \left[\mathcal{D}x\right]\hspace{4pt} {\mathcal{O}(x)\, p(x)}$

:::

::: {.column width="49%"}
![](https://raw.githubusercontent.com/saforem2/deep-fridays/main/assets/normal_distribution.dark.svg)
:::

::::

If these were <span style="color:#00CCFF;">independent</span>, we could approximate:
  $\left\langle\mathcal{O}\right\rangle \simeq \frac{1}{N}\sum^{N}_{n=1}\mathcal{O}(x_{n})$  
  $$\sigma_{\mathcal{O}}^{2} = \frac{1}{N}\mathrm{Var}{\left[\mathcal{O} (x)
  \right]}\Longrightarrow \sigma_{\mathcal{O}} \propto \frac{1}{\sqrt{N}}$$

[^notation1]: Here, $\sim$ means "is distributed according to"

::: footer
[{{< fa brands github >}} `saforem2/lattice23`](https://saforem2.github.io/lattice23)
:::

# Markov Chain Monte Carlo (MCMC) {.centeredslide background-color="#1c1c1c"}

:::: {.columns}

::: {.column width="50%"}

::: {.callout-note collapse=false icon=false title="🎯 Goal" style="text-align:left;!important; width: 100%!important;"}
Generate **independent** samples $\{x_{i}\}$, such that[^notation2]
$$\{x_{i}\} \sim p(x) \propto e^{-S(x)}$$
where $S(x)$ is the _action_ (or potential energy)
:::

- Want to calculate observables $\mathcal{O}$:  
  $\left\langle \mathcal{O}\right\rangle \propto \int \left[\mathcal{D}x\right]\hspace{4pt} {\mathcal{O}(x)\, p(x)}$

:::

::: {.column width="49%"}
![](https://raw.githubusercontent.com/saforem2/deep-fridays/main/assets/normal_distribution.dark.svg)
:::

::::

Instead, nearby configs are [correlated]{.red-text}, and we incur a factor of
$\textcolor{#FF5252}{\tau^{\mathcal{O}}_{\mathrm{int}}}$:
  $$\sigma_{\mathcal{O}}^{2} =
  \frac{\textcolor{#FF5252}{\tau^{\mathcal{O}}_{\mathrm{int}}}}{N}\mathrm{Var}{\left[\mathcal{O}
  (x) \right]}$$

[^notation2]: Here, $\sim$ means "is distributed according to"

::: footer
[{{< fa brands github >}} `saforem2/lattice23`](https://github.com/saforem2/lattice23)
:::

# Hamiltonian Monte Carlo (HMC) {.center background-color="#1c1c1c"}

- Want to (sequentially) construct a chain of states:
  $$x_{0} \rightarrow x_{1} \rightarrow x_{i} \rightarrow \cdots \rightarrow x_{N}\hspace{10pt}$$

  such that, as $N \rightarrow \infty$:
  $$\left\{x_{i}, x_{i+1}, x_{i+2}, \cdots, x_{N} \right\} \xrightarrow[]{N\rightarrow\infty} p(x)
  \propto e^{-S(x)}$$

::: {.callout-tip icon=false collapse=false title="🪄 Trick" style="display:inline!important; width: 100%!important;"}
  - Introduce [fictitious]{.green-text} momentum $v \sim \mathcal{N}(0, \mathbb{1})$
    - Normally distributed **independent** of $x$, i.e.
  $$\begin{align*}
    p(x, v) &\textcolor{#02b875}{=} p(x)\,p(v) \propto e^{-S{(x)}} e^{-\frac{1}{2} v^{T}v}
    = e^{-\left[S(x) + \frac{1}{2} v^{T}{v}\right]}
    \textcolor{#02b875}{=} e^{-H(x, v)}
  \end{align*}$$
:::

## Hamiltonian Monte Carlo (HMC) {.centeredslide background-color="#1c1c1c"}

:::: {.columns}

::: {.column width="55%"}

- [**Idea**]{.green-text}: Evolve the $(\dot{x}, \dot{v})$ system to get new states $\{x_{i}\}$❗

- Write the **joint distribution** $p(x, v)$:
  $$
  p(x, v) \propto e^{-S[x]} e^{-\frac{1}{2}v^{T} v} = e^{-H(x, v)}
  $$
:::

::: {.column width="45%"}

::: {.callout-tip collapse=false icon=false title="🔋 Hamiltonian Dynamics" style="width:100%!important;"}
$H = S[x] + \frac{1}{2} v^{T} v \Longrightarrow$
$$\dot{x} = +\partial_{v} H,
\,\,\dot{v} = -\partial_{x} H$$
:::

:::

::::

::: {#fig-hmc-traj}

![](https://raw.githubusercontent.com/saforem2/deep-fridays/main/assets/hmc1.svg){.r-stretch}

Overview of HMC algorithm
:::

## Leapfrog Integrator (HMC) {#sec-leapfrog background-color="#1c1c1c"}

:::: {.columns style="font-size: 0.9em;" height="100%"}

::: {.column width="40%"}

::: {.callout-tip collapse=false icon=false title="🔋 Hamiltonian Dynamics" style="width:100%!important;"}
$\left(\dot{x}, \dot{v}\right) = \left(\partial_{v} H, -\partial_{x} H\right)$
:::

::: {.callout-note collapse=false icon=false title="🐸 Leapfrog Step" style="width:100%!important;"}
`input` $\,\left(x, v\right) \rightarrow \left(x', v'\right)\,$ `output`

$$\begin{align*}
\tilde{v} &:= \textcolor{#F06292}{\Gamma}(x, v)\hspace{2.2pt} = v - \frac{\varepsilon}{2} \partial_{x} S(x) \\
x' &:= \textcolor{#FD971F}{\Lambda}(x, \tilde{v}) \, =  x + \varepsilon \, \tilde{v} \\
v' &:= \textcolor{#F06292}{\Gamma}(x', \tilde{v}) = \tilde{v} - \frac{\varepsilon}{2} \partial_{x} S(x')
\end{align*}$$

:::

::: {.callout-warning collapse=false icon=false title="⚠️ Warning!" style="width:100%!important;"}
- Resample $v_{0} \sim \mathcal{N}(0, \mathbb{1})$  
at the [beginning]{.yellow-text} of each trajectory
:::

::: {style="font-size:0.8em; margin-left:13%;"}
[**Note**: $\partial_{x} S(x)$ is the _force_]{.dim-text}
:::

:::

::: {.column width="55%" style="text-align:left;"}

![](./assets/hmc1/hmc-update-light.svg){width=60%}

:::

::::

## HMC Update {background-color="#1c1c1c"}

:::: {.columns}

::: {.column width="65%"}
- We build a trajectory of $N_{\mathrm{LF}}$ **leapfrog steps**[^v0]
  $$\begin{equation*}
  (x_{0}, v_{0})%
  \rightarrow (x_{1}, v_{1})\rightarrow \cdots%
  \rightarrow (x', v')
  \end{equation*}$$

- And propose $x'$ as the next state in our chain

$$\begin{align*}
  \textcolor{#F06292}{\Gamma}: (x, v) \textcolor{#F06292}{\rightarrow} v' &:= v - \frac{\varepsilon}{2} \partial_{x} S(x) \\
  \textcolor{#FD971F}{\Lambda}: (x, v) \textcolor{#FD971F}{\rightarrow} x' &:= x + \varepsilon v
\end{align*}$$

- We then accept / reject $x'$ using Metropolis-Hastings criteria,  
  $A(x'|x) = \min\left\{1, \frac{p(x')}{p(x)}\left|\frac{\partial x'}{\partial x}\right|\right\}$

:::

::: {.column width="30%"}

![](./assets/hmc1/hmc-update-light.svg)

:::

::::

[^v0]: We **always** start by resampling the momentum, $v_{0} \sim
\mathcal{N}(0, \mathbb{1})$

## HMC Demo {.centeredslide background-color="#1c1c1c"}

::: {#fig-hmc-demo}

<iframe data-src="https://chi-feng.github.io/mcmc-demo/app.html" width="90%" height="500" title="l2hmc-qcd"></iframe>

HMC Demo
:::

# Issues with HMC {#sec-issues style="font-size:0.9em;" background-color="#1c1c1c"}

- What do we want in a good sampler?
  - **Fast mixing** (small autocorrelations)
  - **Fast burn-in** (quick convergence)

- Problems with HMC:
  - Energy levels selected randomly $\rightarrow$ **slow mixing**
  - Cannot easily traverse low-density zones $\rightarrow$ **slow convergence**

::: {#fig-hmc-issues layout-ncol=2}
![HMC Samples with $\varepsilon=0.25$](https://raw.githubusercontent.com/saforem2/l2hmc-dwq25/main/docs/assets/hmc_traj_eps025.svg)

![HMC Samples with $\varepsilon=0.5$](https://raw.githubusercontent.com/saforem2/l2hmc-dwq25/main/docs/assets/hmc_traj_eps05.svg)

HMC Samples generated with varying step sizes $\varepsilon$
:::

# Topological Freezing {.center background-color="#1c1c1c"}

:::: {.flex-container style="text-align: center; width: 100%"}

::: {.col1 width="45%" style="text-align: left; font-size: 0.9em;"}

**Topological Charge**:
$$Q = \frac{1}{2\pi}\sum_{P}\left\lfloor x_{P}\right\rfloor  \in \mathbb{Z}$$

[**note:** $\left\lfloor x_{P} \right\rfloor = x_{P} - 2\pi
\left\lfloor\frac{x_{P} + \pi}{2\pi}\right\rfloor$]{.dim-text style="font-size:0.8em;"}


::: {.callout-important collapse=false icon=false title="⏳ Critical Slowing Down" style="text-align:left; width:100%!important;"}
- $Q$ gets stuck!
    - as $\beta\longrightarrow \infty$:
        - $Q \longrightarrow \text{const.}$
        - $\delta Q = \left(Q^{\ast} - Q\right) \rightarrow 0 \textcolor{#FF5252}{\Longrightarrow}$
    - \# configs required to estimate errors  
    **grows exponentially**:
    [$\tau_{\mathrm{int}}^{Q} \longrightarrow \infty$]{.red-text}
:::

:::

::: {.flex-container width="45%"}

![Note $\delta Q \rightarrow 0$ at increasing
$\beta$](https://raw.githubusercontent.com/saforem2/l2hmc-dwq25/main/docs/assets/critical_slowing_down.svg){width="80%"}

:::

::::

# Can we do better? {#sec-can-we-do-better background-color="#1c1c1c"}

:::: {.columns}

::: {.column width="50%"}

- Introduce two (**invertible NNs**) `vNet` and `xNet`[^l2hmc]:
  - [`vNet: ` $(x, F) \longrightarrow \left(s_{v},\, t_{v},\, q_{v}\right)$]{style="font-size:0.9em;"}  
  - [`xNet: ` $(x, v) \longrightarrow \left(s_{x},\, t_{x},\, q_{x}\right)$]{style="font-size:0.9em;"}

&nbsp;  

- Use these $(s, t, q)$ in the _generalized_ MD update:
  - [[$\Gamma_{\theta}^{\pm}$]{.pink-text} $: ({x}, \textcolor{#07B875}{v}) \xrightarrow[]{\textcolor{#F06292}{s_{v}, t_{v}, q_{v}}} (x, \textcolor{#07B875}{v'})$]{}
  - [[$\Lambda_{\theta}^{\pm}$]{.orange-text} $: (\textcolor{#AE81FF}{x}, v) \xrightarrow[]{\textcolor{#FD971F}{s_{x}, t_{x}, q_{x}}} (\textcolor{#AE81FF}{x'}, v)$]{}

:::

::: {.column width="48%"}

::: {#fig-mdupdate}

![](./assets/leapfrog-layer-2D-U1-vertical.light.svg){style="width:85%; text-align:center;"}

Generalized MD update where [$\Lambda_{\theta}^{\pm}$]{.orange-text},
[$\Gamma_{\theta}^{\pm}$]{.pink-text} are **invertible NNs**

:::

:::

::::

[^l2hmc]: [L2HMC: ](https://github.com/saforem2/l2hmc-qcd) {{< fa solid book >}}
[@Foreman:2021ixr; @Foreman:2021rhs]

# L2HMC: Generalizing the MD Update {#sec-l2hmc .smaller .centeredslide background-color="#1c1c1c"}

:::: {.flex-container style="text-align: left; width: 100%"}

::: {.col1 style="width:45%; font-size: 0.8em;"}

::: {style="border:1px solid red;"}

- Introduce $d \sim \mathcal{U}(\pm)$ to determine the direction of our update

  1. [$\textcolor{#07B875}{v'} =$ [$\Gamma^{\pm}$]{.pink-text}$({x}, \textcolor{#07B875}{v})$]{} [$\hspace{46pt}$ update $v$]{.dim-text style="font-size:0.9em;"}

  2. [$\textcolor{#AE81FF}{x'} =$ [$x_{B}$]{.blue-text}$\,+\,$[$\Lambda^{\pm}$]{.orange-text}$($[$x_{A}$]{.red-text}$, {v'})$]{} [$\hspace{10pt}$ update first **half**: $x_{A}$]{.dim-text style="font-size:0.9em;"}

  3. [$\textcolor{#AE81FF}{x''} =$ [$x'_{A}$]{.red-text}$\,+\,$[$\Lambda^{\pm}$]{.orange-text}$($[$x'_{B}$]{.blue-text}$, {v'})$]{} [$\hspace{8pt}$ update other half: $x_{B}$]{.dim-text style="font-size:0.9em;"}

  4. [$\textcolor{#07B875}{v''} =$ [$\Gamma^{\pm}$]{.pink-text}$({x''}, \textcolor{#07B875}{v'})$]{} [$\hspace{36pt}$ update $v$]{.dim-text style="font-size:0.9em;"}

:::

::: {style="border:1px solid red;"}

- Resample both $v\sim \mathcal{N}(0, 1)$, and $d \sim \mathcal{U}(\pm)$ at the
beginning of each trajectory
  - To ensure ergodicity + reversibility, we split the [$x$]{.purple-text}
  update into sequential (complementary) updates
- Introduce directional variable $d \sim \mathcal{U}(\pm)$, resampled at the
beginning of each trajectory:
  - Note that $\left(\Gamma^{+}\right)^{-1} = \Gamma^{-}$, i.e.
  $$\Gamma^{+}\left[\Gamma^{-}(x, v)\right] = \Gamma^{-}\left[\Gamma^{+}(x,
  v)\right] = (x, v)$$

:::

:::

::: {.col2 style="width:55%;"}

::: {#fig-mdupdate}

![](./assets/leapfrog-layer-2D-U1-vertical.light.svg){style="width:85%; text-align:center;"}

Generalized MD update with [$\Lambda_{\theta}^{\pm}$]{.orange-text}, [$\Gamma_{\theta}^{\pm}$]{.pink-text} **invertible NNs**
:::

:::

::::

## L2HMC: Leapfrog Layer {.center width="100%" background-color="#1c1c1c"}

:::: {.flex-container}

::: {.column style="width: 35%;"}
![](https://raw.githubusercontent.com/saforem2/l2hmc-dwq25/main/docs/assets/drawio/update_steps.svg){.absolute  top="30" width="40%"}
:::

::: {.column style="width:65%;"}
![](https://raw.githubusercontent.com/saforem2/l2hmc-dwq25/main/docs/assets/drawio/leapfrog_layer_dark2.svg){width="100%"}
:::

![](https://raw.githubusercontent.com/saforem2/l2hmc-dwq25/main/docs/assets/drawio/network_functions.svg){.absolute top=440 width="100%"}

::::

## L2HMC Update {style="font-size: 0.775em;" background-color="#1c1c1c"}

:::: {.columns}

::: {.column width="65%" style="font-size:0.7em;"}

::: {.callout-important collapse=false icon=false title="👨‍💻 Algorithm" style="text-align:left; width: 100%!important;"}

1. `input`: [$x$]{.purple-text}

    - Resample: $\textcolor{#07B875}{v} \sim \mathcal{N}(0, \mathbb{1})$; $\,\,{d\sim\mathcal{U}(\pm)}$
    - Construct initial state:
     $\textcolor{#939393}{\xi} =(\textcolor{#AE81FF}{x}, \textcolor{#07B875}{v}, {\pm})$

2. `forward`: Generate [proposal $\xi'$]{style="color:#f8f8f8"} by passing [initial $\xi$]{style="color:#939393"} through $N_{\mathrm{LF}}$ leapfrog layers  
$$\textcolor{#939393} \xi \hspace{1pt}\xrightarrow[]{\tiny{\mathrm{LF} \text{ layer}}}\xi_{1} \longrightarrow\cdots \longrightarrow \xi_{N_{\mathrm{LF}}} = \textcolor{#f8f8f8}{\xi'} := (\textcolor{#AE81FF}{x''}, \textcolor{#07B875}{v''})$$

    - Accept / Reject:
      $$\begin{equation*}
      A({\textcolor{#f8f8f8}{\xi'}}|{\textcolor{#939393}{\xi}})=
      \mathrm{min}\left\{1,
      \frac{\pi(\textcolor{#f8f8f8}{\xi'})}{\pi(\textcolor{#939393}{\xi})} \left| \mathcal{J}\left(\textcolor{#f8f8f8}{\xi'},\textcolor{#939393}{\xi}\right)\right| \right\}
      \end{equation*}$$

5. `backward` (if training):  
    - Evaluate the **loss function**[^loss] $\mathcal{L}\gets \mathcal{L}_{\theta}(\textcolor{#f8f8f8}{\xi'}, \textcolor{#939393}{\xi})$ and backprop
6. `return`: $\textcolor{#AE81FF}{x}_{i+1}$  
  Evaluate MH criteria $(1)$ and return accepted config, 
  $$\textcolor{#AE81FF}{{x}_{i+1}}\gets
  \begin{cases}
  \textcolor{#f8f8f8}{\textcolor{#AE81FF}{x''}} \small{\text{ w/ prob }} A(\textcolor{#f8f8f8}{\xi''}|\textcolor{#939393}{\xi}) \hspace{26pt} ✅ \\
  \textcolor{#939393}{\textcolor{#AE81FF}{x}} \hspace{5pt}\small{\text{ w/ prob }} 1 - A(\textcolor{#f8f8f8}{\xi''}|{\textcolor{#939393}{\xi}}) \hspace{10pt} 🚫
  \end{cases}$$

:::

:::

::: {.column width="35%"}

::: {#fig-mdupdate}

![](./assets/leapfrog-layer-2D-U1-vertical.light.svg){style="width:75%; text-align:center;"}

**Leapfrog Layer** used in generalized MD update
:::

:::

::::

[^loss]: 
    For simple $\mathbf{x} \in \mathbb{R}^{2}$ example, $\mathcal{L}_{\theta} =
    A(\xi^{\ast}|\xi)\cdot \left(\mathbf{x}^{\ast} - \mathbf{x}\right)^{2}$


# 4D $SU(3)$ Model {#sec-su3 .centeredslide background-color="#1c1c1c"}

:::: {.columns}

::: {.column width="50%"}

::: {.callout-note collapse=false icon=false title="🔗 Link Variables" style="text-align:left; width: 100%!important;"}

- Write link variables $U_{\mu}(x) \in SU(3)$:

  $$ \begin{align*} 
  U_{\mu}(x) &= \mathrm{exp}\left[{i\, \textcolor{#AE81FF}{\omega^{k}_{\mu}(x)} \lambda^{k}}\right]\\
  &= e^{i \textcolor{#AE81FF}{Q}},\quad \text{with} \quad \textcolor{#AE81FF}{Q} \in \mathfrak{su}(3)
  \end{align*}$$ 

  [where [$\omega^{k}_{\mu}(x)$]{.purple-text} $\in \mathbb{R}$, and $\lambda^{k}$ are
  the generators of $SU(3)$]{style="font-size:0.9em;"}

:::

::: {.callout-tip collapse=false icon=false title="🏃‍♂️‍➡️ Conjugate Momenta" style="text-align:left; width:100%!important;"}

- Introduce [$P_{\mu}(x) = P^{k}_{\mu}(x) \lambda^{k}$]{.green-text} conjugate to
[$\omega^{k}_{\mu}(x)$]{.purple-text}

:::

::: {.callout-important collapse=false icon=false title="🟥 Wilson Action" style="text-align:left; width:100%!important;"}

$$ S_{G} = -\frac{\beta}{6} \sum
\mathrm{Tr}\left[U_{\mu\nu}(x)
+ U^{\dagger}_{\mu\nu}(x)\right] $$

where $U_{\mu\nu}(x) = U_{\mu}(x) U_{\nu}(x+\hat{\mu})
U^{\dagger}_{\mu}(x+\hat{\nu}) U^{\dagger}_{\nu}(x)$

:::


:::

::: {.column width="45%"}

::: {#fig-4dlattice}

![](./assets/u1lattice.dark.svg){width="90%"}

Illustration of the lattice
:::

:::

::::

## HMC: 4D $SU(3)$ {#sec-hmcsu3 background-color="#1c1c1c"}

Hamiltonian: $H[P, U] = \frac{1}{2} P^{2} + S[U] \Longrightarrow$

:::: {.columns}

::: {.column style="font-size:0.9em; text-align: center;"}

::: {.callout collapse=false style="text-align:left; width: 100%!important;"}

- [$U$ update]{style="border-bottom: 2px solid #AE81FF;"}:
[$\frac{d\omega^{k}}{dt} = \frac{\partial H}{\partial P^{k}}$]{.purple-text style="font-size:1.5em;"}
$$\frac{d\omega^{k}}{dt}\lambda^{k} = P^{k}\lambda^{k} \Longrightarrow \frac{dQ}{dt} = P$$
$$\begin{align*}
Q(\textcolor{#FFEE58}{\varepsilon}) &= Q(0) + \textcolor{#FFEE58}{\varepsilon} P(0)\Longrightarrow\\
-i\, \log U(\textcolor{#FFEE58}{\varepsilon}) &= -i\, \log U(0) + \textcolor{#FFEE58}{\varepsilon} P(0) \\
U(\textcolor{#FFEE58}{\varepsilon}) &= e^{i\,\textcolor{#FFEE58}{\varepsilon} P(0)} U(0)\Longrightarrow \\
&\hspace{1pt}\\
\textcolor{#FD971F}{\Lambda}:\,\, U \longrightarrow U' &:= e^{i\varepsilon P'} U
\end{align*}$$

:::

::: aside
[$\textcolor{#FFEE58}{\varepsilon}$ is the step size]{.dim-text style="font-size:0.8em;"}
:::

:::

::: {.column style="font-size:0.9em; text-align: center;"}
::: {.callout collapse=false style="text-align:left; width: 100%!important;"}
- [$P$ update]{style="border-bottom: 2px solid #07B875;"}:
[$\frac{dP^{k}}{dt} = - \frac{\partial H}{\partial \omega^{k}}$]{.green-text style="font-size:1.5em;"} 
$$\frac{dP^{k}}{dt} = - \frac{\partial H}{\partial \omega^{k}}
= -\frac{\partial H}{\partial Q} = -\frac{dS}{dQ}\Longrightarrow$$
$$\begin{align*}
P(\textcolor{#FFEE58}{\varepsilon}) &= P(0) - \textcolor{#FFEE58}{\varepsilon} \left.\frac{dS}{dQ}\right|_{t=0} \\
&= P(0) - \textcolor{#FFEE58}{\varepsilon} \,\textcolor{#E599F7}{F[U]} \\
&\hspace{1pt}\\
\textcolor{#F06292}{\Gamma}:\,\, P \longrightarrow P' &:= P - \frac{\varepsilon}{2} F[U]
\end{align*}$$
:::

::: aside
[$\textcolor{#E599F7}{F[U]}$ is the force term]{.dim-text style="font-size:0.8em;"}
:::
:::

::::


## HMC: 4D $SU(3)$ {.centeredslide background-color="#1c1c1c"}

:::: {.columns}

::: {.column width="47%" style="text-align:left;"}

- [Momentum Update]{style="border-bottom: 2px solid #F06292;"}:
  $$\textcolor{#F06292}{\Gamma}: P \longrightarrow P' := P - \frac{\varepsilon}{2} F[U]$$

- [Link Update]{style="border-bottom: 2px solid #FD971F;"}:
  $$\textcolor{#FD971F}{\Lambda}: U \longrightarrow U' := e^{i\varepsilon P'} U\quad\quad$$

- We maintain a batch of `Nb` lattices, all updated in parallel
  - $U$`.dtype = complex128`
  - $U$`.shape`  
    [`= [Nb, 4, Nt, Nx, Ny, Nz, 3, 3]`]{style="font-size: 0.95em;"}

:::

::: {.column width="47%" style="text-align:right;"}

![](./assets/hmc/hmc-update-light.svg){width=60%}

:::

::::

# Networks 4D $SU(3)$ {#sec-su3networks .centeredslide auto-animate="true" background-color="#1c1c1c"}

:::: {.columns}

::: {.column width="54%" style="font-size:0.9em;"}

&nbsp;<br>  

&nbsp;<br>  

[$U$]{.purple-text}-Network:

  [`UNet: ` $(U, P) \longrightarrow \left(s_{U},\, t_{U},\, q_{U}\right)$]{style="font-size:0.9em;"}


&nbsp;<br>  

::: {style="border: 1px solid #1c1c1c; border-radius: 6px; padding:1%;"}

[$P$]{.green-text}-Network:

  [`PNet: ` $(U, P) \longrightarrow \left(s_{P},\, t_{P},\, q_{P}\right)$]{style="font-size:0.9em;"}  

:::

:::

::: {.column width="45%" style="text-align:right;"}

![](./assets/leapfrog-layer-4D-SU3-vertical.light.svg){width="80%"}

:::

::::

# Networks 4D $SU(3)$ {.centeredslide auto-animate="true" background-color="#1c1c1c"}

:::: {.columns}

::: {.column width="54%" style="font-size:0.9em;"}

&nbsp;<br>  

&nbsp;<br>  

[$U$]{.purple-text}-Network:

  [`UNet: ` $(U, P) \longrightarrow \left(s_{U},\, t_{U},\, q_{U}\right)$]{style="font-size:0.9em;"}

&nbsp;<br>  

::: {style="border: 1px solid #07B875; border-radius: 6px; padding:1%;"}

[$P$]{.green-text}-Network:

  [`PNet: ` $(U, P) \longrightarrow \left(s_{P},\, t_{P},\, q_{P}\right)$]{style="font-size:0.9em;"}  

:::

[$\uparrow$]{.dim-text}  
[let's look at this]{.dim-text style="padding-top: 0.5em!important;"}

:::

::: {.column width="45%" style="text-align:right;"}

![](./assets/leapfrog-layer-4D-SU3-vertical.light.svg){width="80%"}

:::

::::

## $P$-`Network` (pt. 1) {style="font-size:0.95em;" background-color="#1c1c1c"}

::: {style="text-align:center;"}

![](./assets/SU3/PNetwork.light.svg)

:::

:::: {.columns}

::: {.column width="50%"}

- [`input`[^sigma]: $\hspace{7pt}\left(U, F\right) := (e^{i Q}, F)$]{style="border-bottom: 2px solid rgba(131, 131, 131, 0.493);"}
  $$\begin{align*}
  h_{0} &= \sigma\left( w_{Q} Q + w_{F} F + b \right) \\
  h_{1} &= \sigma\left( w_{1} h_{0} + b_{1} \right) \\
  &\vdots \\
  h_{n} &= \sigma\left(w_{n-1} h_{n-2} + b_{n}\right) \\
  \textcolor{#FF5252}{z} & := \sigma\left(w_{n} h_{n-1} + b_{n}\right) \longrightarrow \\
  \end{align*}$$

:::

::: {.column width="50%"}

- [`output`[^lambda1]: $\hspace{7pt} (s_{P}, t_{P}, q_{P})$]{style="border-bottom: 2px solid rgba(131, 131, 131, 0.5);"}

  - $s_{P} = \lambda_{s} \tanh(w_s \textcolor{#FF5252}{z} + b_s)$
  - $t_{P} = w_{t} \textcolor{#FF5252}{z} + b_{t}$
  - $q_{P} = \lambda_{q} \tanh(w_{q} \textcolor{#FF5252}{z} + b_{q})$

:::

::::

[^sigma]: $\sigma(\cdot)$ denotes an activation function
[^lambda1]: $\lambda_{s}, \lambda_{q} \in \mathbb{R}$, trainable parameters

## $P$-`Network` (pt. 2) {style="font-size:0.9em;" background-color="#1c1c1c"}

::: {style="text-align:center;"}

![](./assets/SU3/PNetwork.light.svg)

:::

- Use $(s_{P}, t_{P}, q_{P})$ to update $\Gamma^{\pm}: (U, P) \rightarrow
\left(U, P_{\pm}\right)$[^inverse]:

    - [forward]{style="color:#FF5252"} $(d = \textcolor{#FF5252}{+})$:
    $$\Gamma^{\textcolor{#FF5252}{+}}(U, P) := P_{\textcolor{#FF5252}{+}} = P \cdot e^{\frac{\varepsilon}{2} s_{P}} - \frac{\varepsilon}{2}\left[ F \cdot e^{\varepsilon q_{P}} + t_{P} \right]$$

    - [backward]{style="color:#1A8FFF;"} $(d = \textcolor{#1A8FFF}{-})$: 
    $$\Gamma^{\textcolor{#1A8FFF}{-}}(U, P) := P_{\textcolor{#1A8FFF}{-}} = e^{-\frac{\varepsilon}{2} s_{P}} \left\{P + \frac{\varepsilon}{2}\left[ F \cdot e^{\varepsilon q_{P}} + t_{P} \right]\right\}$$


[^inverse]: Note that $\left(\Gamma^{+}\right)^{-1} = \Gamma^{-}$, i.e. $\Gamma^{+}\left[\Gamma^{-}(U, P)\right] = \Gamma^{-}\left[\Gamma^{+}(U, P)\right] = (U, P)$

# Results: 2D $U(1)$ {#sec-results .centeredslide background-color="#1c1c1c"}

:::: {.columns}

::: {.column width=50% style="align:top;"}
![](https://raw.githubusercontent.com/saforem2/physicsSeminar/main/assets/autocorr_new.svg){width="90%"}
:::

::: {.column width="33%" style="text-align:left; padding-top: 5%;"}

::: {.callout-important icon=false collapse=false title="📈 Improvement" style="text-align:left!important; width: 100%!important;"}
We can measure the performance by comparing $\tau_{\mathrm{int}}$ for the
[**trained model**]{style="color:#FF2052;"} vs.
[**HMC**]{style="color:#9F9F9F;"}.  
  
**Note**: [lower]{style="color:#FF2052;"} is better
:::

:::

::::

![](https://raw.githubusercontent.com/saforem2/physicsSeminar/main/assets/charge_histories.svg){.absolute top=400 left=0 width="100%" style="margin-bottom: 1em;margin-top: 1em;"}


## Interpretation {#sec-interpretation .centeredslide background-color="#1c1c1c"}

:::: {.columns style="margin-left:1pt;"}

::: {.column width="36%"}

[Deviation in $x_{P}$]{.dim-text style="text-align:center; font-size:0.8em"}

:::

::: {.column width="30%"}

[Topological charge mixing]{.dim-text style="text-align:right; font-size:0.8em"}

:::

::: {.column width="32%"}

[Artificial influx of energy]{.dim-text style="text-align:right!important; font-size:0.8em;"}

:::

::::

::: {#fig-interpretation}

![](https://raw.githubusercontent.com/saforem2/physicsSeminar/main/assets/ridgeplots.svg){width="100%"}

Illustration of how different observables evolve over a single L2HMC
trajectory.
:::


## Interpretation {.centeredslide background-color="#1c1c1c"}

::: {#fig-energy-ridgeplot layout-ncol=2 layout-valign="top"}

![Average plaquette: $\langle x_{P}\rangle$ vs LF step](https://raw.githubusercontent.com/saforem2/physicsSeminar/main/assets/plaqsf_ridgeplot.svg)

![Average energy: $H - \sum\log|\mathcal{J}|$](https://raw.githubusercontent.com/saforem2/physicsSeminar/main/assets/Hf_ridgeplot.svg)

The trained model artifically increases the energy towards
the middle of the trajectory, allowing the sampler to tunnel between isolated
sectors.
:::


# 4D $SU(3)$ Results {#sec-su3results background-color="#1c1c1c"}


- Distribution of $\log|\mathcal{J}|$ over all chains, at each _leapfrog step_,  $N_{\mathrm{LF}}$
($= 0, 1, \ldots, 8$)
during training:

::: {layout="[ [30, 30, 30] ]" layout-valign="center" style="display: flex; flex-direction: row; margin-top: -0.0em; align-items: center;"}

![`100` train iters](./assets/SU3/logdet_ridgeplot1.svg){#fig-ridgeplot1}

![`500` train iters](./assets/SU3/logdet_ridgeplot2.svg){#fig-ridgeplot2}

![`1000` train iters](./assets/SU3/logdet_ridgeplot3.svg){#fig-ridgeplot3}

:::

## 4D $SU(3)$ Results: $\delta U_{\mu\nu}$ {background-color="#1c1c1c"}

::: {#fig-pdiff}

![](./assets/SU3/pdiff.svg)

The difference in the average plaquette $\left|\delta U_{\mu\nu}\right|^{2}$
between the trained model and HMC

:::

## 4D $SU(3)$ Results: $\delta U_{\mu\nu}$ {background-color="#1c1c1c"}

::: {#fig-pdiff-robust}

![](./assets/SU3/pdiff-robust.svg)

The difference in the average plaquette $\left|\delta U_{\mu\nu}\right|^{2}$
between the trained model and HMC

:::

# Next Steps {#sec-next-steps background-color="#1c1c1c"}

- Further code development
  - {{< fa brands github >}} [`saforem2/l2hmc-qcd`](https://github.com/saforem2/l2hmc-qcd)

- Continue to use / test different network architectures
  - Gauge equivariant NNs for $U_{\mu}(x)$ update

- Continue to test different loss functions for training

- Scaling:
  - Lattice volume
  - Network size
  - Batch size
  - \# of GPUs

## Thank you! {#sec-thank-you background-color="#1c1c1c"}

&nbsp;<br>  

::: {layout-ncol=4 style="text-align:left; font-size:0.8em;"}

[[{{< fa solid home >}} `samforeman.me`](https://samforeman.me)]{style="font-size:0.8em;"}

[[{{< fa brands github >}} `saforem2`](https://github.com/saforem2)]{style="font-size:0.8em;"}

[[{{< fa brands twitter >}} `@saforem2`](https://www.twitter.com/saforem2)]{style="font-size:0.8em;"}

[[{{< fa regular paper-plane >}} `[email protected]`](mailto:///[email protected])]{style="font-size:0.8em;"}

:::

::: {.callout-note icon=false collapse=false title="🙏 Acknowledgements" style="width: 100%!important;"}

This research used resources of the Argonne Leadership Computing Facility,  
which is a DOE Office of Science User Facility supported under Contract DE-AC02-06CH11357.

:::


## {#sec-l2hmc-gh background-color="#1c1c1c"}

::: {style="text-align:center;"}

[![](https://raw.githubusercontent.com/saforem2/l2hmc-qcd/main/assets/logo-small.svg)](https://github.com/saforem2/l2hmc-qcd)

<a href="https://hits.seeyoufarm.com"><img alt="hits" src="https://hits.seeyoufarm.com/api/count/incr/badge.svg?url=https%3A%2F%2Fgithub.com%2Fsaforem2%2Fl2hmc-qcd&count_bg=%2300CCFF&title_bg=%23555555&icon=&icon_color=%23111111&title=👋&edge_flat=false"></a>
<a href="https://github.com/saforem2/l2hmc-qcd/"><img alt="l2hmc-qcd" src="https://img.shields.io/badge/-l2hmc--qcd-252525?style=flat&logo=github&labelColor=gray"></a> <a href="https://www.codefactor.io/repository/github/saforem2/l2hmc-qcd"><img alt="codefactor" src="https://www.codefactor.io/repository/github/saforem2/l2hmc-qcd/badge"></a>  

<a href="https://arxiv.org/abs/2112.01582"><img alt="arxiv" src="http://img.shields.io/badge/arXiv-2112.01582-B31B1B.svg"></a> <a href="https://arxiv.org/abs/2105.03418"><img alt="arxiv" src="http://img.shields.io/badge/arXiv-2105.03418-B31B1B.svg"></a>  

<a href="https://hydra.cc"><img alt="hydra" src="https://img.shields.io/badge/Config-Hydra-89b8cd"></a> <a href="https://pytorch.org/get-started/locally/"><img alt="pyTorch" src="https://img.shields.io/badge/PyTorch-ee4c2c?logo=pytorch&logoColor=white"></a> <a href="https://www.tensorflow.org"><img alt="tensorflow" src="https://img.shields.io/badge/TensorFlow-%23FF6F00.svg?&logo=TensorFlow&logoColor=white"></a> 
[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Weights & Biases monitoring" height=20>](https://wandb.ai/l2hmc-qcd/l2hmc-qcd)

:::


## Acknowledgements {#sec-acknowledgements background-color="#1c1c1c"}

:::: {.columns}


::: {.column width="50%"}

- **Links**:
   - [{{< fa brands github >}} Link to github](https://github.com/saforem2/l2hmc-qcd)
   - [{{< fa solid paper-plane >}} reach out!](mailto:[email protected])

- **References**:
    - [Link to slides](https://saforem2.github.io/lattice23/)
        - [{{< fa brands github >}} link to github with slides](https://github.com/saforem2/lattice23)
   - {{< fa solid book >}} [@Foreman:2021ljl; @Foreman:2021rhs; @Foreman:2021ixr]
   - {{< fa solid book >}} [@Boyda:2022nmh; @Shanahan:2022ifi]

:::

::: {.column width="50%"}

- Huge thank you to:
  - Yannick Meurice
  - Norman Christ
  - Akio Tomiya
  - Nobuyuki Matsumoto
  - Richard Brower
  - Luchang Jin
  - Chulwoo Jung
  - Peter Boyle
  - Taku Izubuchi
  - Denis Boyda
  - Dan Hackett
  - ECP-CSD group
  - [**ALCF Staff + Datascience Group**]{.red-text}

:::

::::


## {#sec-references background-color="#1c1c1c"}

:::: {.columns}

::: {.column width="50%"}

### Links {background-color="#1c1c1c"}

- [{{< fa brands github >}} `saforem2/l2hmc-qcd`](https://github.com/saforem2/l2hmc-qcd)

- [📊 slides](https://saforem2.github.io/lattice23) (Github: [{{< fa brands github >}} `saforem2/lattice23`](https://github.com/saforem2/lattice23))

:::

::: {.column width="50%"}

### References {background-color="#1c1c1c"}

- [Title Slide Background (worms) animation](https://saforem2.github.io/grid-worms-animation/)
  - Github: [{{< fa brands github >}} `saforem2/grid-worms-animation`](https://github.com/saforem2/grid-worms-animation)

- [Link to HMC demo](https://chi-feng.github.io/mcmc-demo/app.html)

:::

::::


## References {style="line-height:1.2em;" background-color="#1c1c1c"}

(I don't know why this is broken 🤷🏻‍♂️ )

::: {#refs}
:::

# Extras {#sec-extras background-color="#1c1c1c"}

## Integrated Autocorrelation Time {.centeredslide background-color="#1c1c1c"}

::: {#fig-iat}
![](https://raw.githubusercontent.com/saforem2/physicsSeminar/main/assets/tint1.svg){width="100%"}

Plot of the integrated autocorrelation time for both the trained model
(colored) and HMC (greyscale).
:::

## Comparison {background-color="#1c1c1c"}

::: {#fig-comparison layout-ncol=2}

![Trained model](https://saforem2.github.io/anl-job-talk/assets/dQint_eval.svg){#fig-eval}

![Generic HMC](https://saforem2.github.io/anl-job-talk/assets/dQint_hmc.svg){#fig-hmc}

Comparison of $\langle \delta Q\rangle = \frac{1}{N}\sum_{i=k}^{N} \delta Q_{i}$ for the
trained model [@fig-eval] vs. HMC [@fig-hmc]
:::

## Plaquette analysis: $x_{P}$ {.centeredslide background-color="#1c1c1c"}

:::: {.columns}

::: {.column width="55%"}

[Deviation from $V\rightarrow\infty$ limit,  $x_{P}^{\ast}$]{.dim-text style="text-align:center; font-size:0.9em;"}
:::

::: {.column width="45%"}

[Average $\langle x_{P}\rangle$, with $x_{P}^{\ast}$ (dotted-lines)]{.dim-text style="text-align:right!important; font-size:0.9em;"}
:::

::::

::: {#fig-avg-plaq}

![](https://raw.githubusercontent.com/saforem2/physicsSeminar/main/assets/plaqsf_vs_lf_step1.svg){width="100%"}

Plot showing how **average plaquette**, $\left\langle x_{P}\right\rangle$
varies over a single trajectory for models trained at different $\beta$, with
varying trajectory lengths $N_{\mathrm{LF}}$
:::

## Loss Function {background-color="#1c1c1c"}

- Want to maximize the _expected_ squared charge difference[^charge-diff]:
  $$\begin{equation*}
  \mathcal{L}_{\theta}\left(\xi^{\ast}, \xi\right) =
  {\mathbb{E}_{p(\xi)}}\big[-\textcolor{#FA5252}{{\delta Q}}^{2}
  \left(\xi^{\ast}, \xi \right)\cdot A(\xi^{\ast}|\xi)\big]
  \end{equation*}$$

- Where:
    - $\delta Q$ is the _tunneling rate_:
      $$\begin{equation*}
      \textcolor{#FA5252}{\delta Q}(\xi^{\ast},\xi)=\left|Q^{\ast} - Q\right|
      \end{equation*}$$

    - $A(\xi^{\ast}|\xi)$ is the probability[^jacobian] of accepting the proposal $\xi^{\ast}$:
      $$\begin{equation*}
      A(\xi^{\ast}|\xi) = \mathrm{min}\left( 1,
      \frac{p(\xi^{\ast})}{p(\xi)}\left|\frac{\partial \xi^{\ast}}{\partial
      \xi^{T}}\right|\right)
      \end{equation*}$$

[^charge-diff]: Where $\xi^{\ast}$ is the _proposed_ configuration (prior to
Accept / Reject)
[^jacobian]: And $\left|\frac{\partial \xi^{\ast}}{\partial \xi^{T}}\right|$ is the
Jacobian of the transformation from $\xi \rightarrow \xi^{\ast}$

## Networks 2D $U(1)$ {auto-animate=true background-color="#1c1c1c"}

- Stack gauge links as `shape`$\left(U_{\mu}\right)$` =[Nb, 2, Nt, Nx]` $\in \mathbb{C}$

  $$ x_{\mu}(n) ≔ \left[\cos(x), \sin(x)\right]$$

  with `shape`$\left(x_{\mu}\right)$` = [Nb, 2, Nt, Nx, 2]` $\in \mathbb{R}$

- $x$-Network:
    - [$\psi_{\theta}: (x, v) \longrightarrow \left(s_{x},\, t_{x},\, q_{x}\right)$]{.purple-text}

- $v$-Network:
  - [$\varphi_{\theta}: (x, v) \longrightarrow \left(s_{v},\, t_{v},\, q_{v}\right)$]{.green-text} [$\hspace{2pt}\longleftarrow$ lets look at this]{.dim-text}

## $v$-Update[^reverse] {background-color="#1c1c1c"}

- [forward]{style="color:#FF5252"} $(d = \textcolor{#FF5252}{+})$:  

$$\Gamma^{\textcolor{#FF5252}{+}}: (x, v) \rightarrow v' := v \cdot e^{\frac{\varepsilon}{2} s_{v}} - \frac{\varepsilon}{2}\left[ F \cdot e^{\varepsilon q_{v}} + t_{v} \right]$$

- [backward]{style="color:#1A8FFF;"} $(d = \textcolor{#1A8FFF}{-})$:

$$\Gamma^{\textcolor{#1A8FFF}{-}}: (x, v) \rightarrow v' := e^{-\frac{\varepsilon}{2} s_{v}} \left\{v + \frac{\varepsilon}{2}\left[ F \cdot e^{\varepsilon q_{v}} + t_{v} \right]\right\}$$

[^reverse]: [Note that $\left(\Gamma^{+}\right)^{-1} = \Gamma^{-}$, i.e. $\Gamma^{+}\left[\Gamma^{-}(x, v)\right] = \Gamma^{-}\left[\Gamma^{+}(x, v)\right] = (x, v)$]{style="font-size:0.8em;"}

## $x$-Update {background-color="#1c1c1c"}

- [forward]{style="color:#FF5252"} $(d = \textcolor{#FF5252}{+})$:

$$\Lambda^{\textcolor{#FF5252}{+}}(x, v) = x \cdot e^{\frac{\varepsilon}{2} s_{x}} - \frac{\varepsilon}{2}\left[ v \cdot e^{\varepsilon q_{x}} + t_{x} \right]$$

- [backward]{style="color:#1A8FFF;"} $(d = \textcolor{#1A8FFF}{-})$:

$$\Lambda^{\textcolor{#1A8FFF}{-}}(x, v) = e^{-\frac{\varepsilon}{2} s_{x}} \left\{x + \frac{\varepsilon}{2}\left[ v \cdot e^{\varepsilon q_{x}} + t_{x} \right]\right\}$$


## Lattice Gauge Theory (2D $U(1)$) {.centeredslide background-color="#1c1c1c"}

:::: {.columns layout-valign="top"}

::: {.column width="50%"}

::: {style="text-align:center;"}

::: {.callout-note icon=false collapse=false title="🔗 Link Variables" style="width:100%!important; text-align:left;"}
$$U_{\mu}(n) = e^{i x_{\mu}(n)}\in \mathbb{C},\quad \text{where}\quad$$
$$x_{\mu}(n) \in [-\pi,\pi)$$
:::

::: {}

::: {.callout-important icon=false collapse=false title="🫸 Wilson Action" style="width:100%!important; text-align:left;"}
$$S_{\beta}(x) = \beta\sum_{P} \cos \textcolor{#00CCFF}{x_{P}},$$

$$\textcolor{#00CCFF}{x_{P}} = \left[x_{\mu}(n) + x_{\nu}(n+\hat{\mu})
- x_{\mu}(n+\hat{\nu})-x_{\nu}(n)\right]$$
:::

[**Note**: $\textcolor{#00CCFF}{x_{P}}$ is the product of
links around $1\times 1$ square, called a ["plaquette"]{.blue-text}]{.dim-text style=font-size:0.8em;}
:::

:::

:::

::: {.column width="50%"}

![2D Lattice](https://raw.githubusercontent.com/saforem2/deep-fridays/main/assets/u1lattice.dark.svg){width="80%"}

:::

::::


## {background-color="white"}

::: {#fig-notebook}

<iframe data-src="https://nbviewer.org/github/saforem2/l2hmc-qcd/blob/SU3/src/l2hmc/notebooks/l2hmc-2dU1.ipynb" width="100%" height="650" title="l2hmc-qcd"></iframe>

Jupyter Notebook

:::

## Annealing Schedule {#sec-annealing-schedule background-color="#1c1c1c"}
 
- Introduce an _annealing schedule_ during the training phase:

  $$\left\{ \gamma_{t}  \right\}_{t=0}^{N} = \left\{\gamma_{0}, \gamma_{1},
  \ldots, \gamma_{N-1}, \gamma_{N} \right\}$$

  where $\gamma_{0} < \gamma_{1} < \cdots < \gamma_{N} \equiv 1$, and $\left|\gamma_{t+1} - \gamma_{t}\right| \ll 1$  

- [**Note**]{.green-text}: 
    - for $\left|\gamma_{t}\right| < 1$, this rescaling helps to reduce the
      height of the energy barriers $\Longrightarrow$
    - easier for our sampler to explore previously inaccessible regions of the phase space


## Networks 2D $U(1)$ {#sec-networks-2dU1 background-color="#1c1c1c"}

- Stack gauge links as `shape`$\left(U_{\mu}\right)$` =[Nb, 2, Nt, Nx]` $\in \mathbb{C}$

  $$ x_{\mu}(n) ≔ \left[\cos(x), \sin(x)\right]$$

  with `shape`$\left(x_{\mu}\right)$` = [Nb, 2, Nt, Nx, 2]` $\in \mathbb{R}$

- $x$-Network:
    - [$\psi_{\theta}: (x, v) \longrightarrow \left(s_{x},\, t_{x},\, q_{x}\right)$]{.purple-text}

- $v$-Network:
    - [$\varphi_{\theta}: (x, v) \longrightarrow \left(s_{v},\, t_{v},\, q_{v}\right)$]{.green-text}

## Toy Example: GMM $\in \mathbb{R}^{2}$ {.centeredslide background-color="#1c1c1c"}

![](https://raw.githubusercontent.com/saforem2/l2hmc-dwq25/main/docs/assets/iso_gmm_chains.svg){#fig-gmm .r-stretch}

## Physical Quantities {background-color="#1c1c1c"}

- To estimate physical quantities, we:
  - Calculate physical observables at **increasing** spatial resolution
  - Perform extrapolation to continuum limit

::: {#fig-continuum}

![](https://raw.githubusercontent.com/saforem2/physicsSeminar/main/assets/static/continuum.svg)

Increasing the physical resolution ($a \rightarrow 0$) allows us to make
predictions about numerical values of physical quantities in the continuum
limit.

:::

## Extra {#sec-extra background-color="#1c1c1c"}

[![](./assets/thumbnail.png)]{.preview-image style="text-align:center; margin-left:auto; margin-right: auto;"}
  • samforeman.me

 
  • View source
  • Edit this page
  • Report an issue