Sam Foreman
  • talks
  • posts
  • projects
    • 📚 All Projects
    • 🍋 ezpz
    • 🟥 l2hmc-qcd
    • 🤖 Megatron-DeepSpeed
    • 💬 wordplay 🎮
    • 🎓 ai-science-training
    • 💸 enrich
    • 🤷🏻‍♂️ambivalent
    • 🌍 climate-analysis
    • 🎨 glitz
    • 🙋🏻personal_site
    • 🗒️ Notes-Demo
  1. 🎙️ Talks
  2. MLMC: Machine Learning Monte Carlo
  • 📚 Projects
  • 📬 Posts
    • 🚧 Frameworks Issue with numpy > 2
    • 💅 How to Make Dope Slides
    • 🍋 ezpz @ ALCF
    • 📝 ezpz-v1
    • 🔥 Building PyTorch from Source on Aurora
    • 👤 Résumé
    • 🫥 svgbob
    • 🪛 Torchtune on Aurora
    • 🚑 Torchtune Patch on Aurora
    • 🤖 AuroraGPT
      • 🏎️ Megatron-DeepSpeed on Intel XPU
      • 💾 Converting Checkpoints
      • 🎰 Deterministic flash-attn
      • 📸 flash-attn on Sunspot
      • 🚂 Loooooooong Sequence Lengths
      • 🐛 mpi4py bug on Sunspot
      • 🏔️ Spike Skipper
      • 🐢 Starting Up Distributed Training on Aurora
    • ⚛️ AI for Physics
      • 🎲 MCMC + Diffusion Sampling
      • 🎢 L2HMC for LQCD
        • 🎢 l2hmc-qcd Example: 2D U(1)
        • 4dSU3nb
          • 🕸️ l2hmc-qcd Example: 4D SU(3)
    • 📗 Jupyter
      • 🏁 l2hmc Example: 2D U(1)U(1)U(1)
      • L2hmc
        • 🔳 l2hmc-qcd Example: 4D SU(3)
  • 🎙️ Talks
    • Parallel Training Methods
    • Deep Learning and Foundation Models at Scale
    • AuroraGPT: Foundation Models for Science
    • AuroraGPT
    • MLMC: Machine Learning Monte Carlo
    • Training LLMs at Scale
    • LLMs on Polaris
    • Test Rendering on Mobile
    • AuroraGPT
      • AuroraGPT: ANL’s General Purpose Scientific LLM
    • ALCF Incite Hackathon 2025
      • LLMs on Aurora: Overview
      • LLMs on Aurora: Hands-On

On this page

  • Overview
  • Markov Chain Monte Carlo (MCMC)
  • Markov Chain Monte Carlo (MCMC)
  • Hamiltonian Monte Carlo (HMC)
    • Hamiltonian Monte Carlo (HMC)
    • Leapfrog Integrator (HMC)
    • HMC Update
    • HMC Demo
  • Issues with HMC
  • Topological Freezing
  • Can we do better?
  • L2HMC: Generalizing the MD Update
    • L2HMC: Leapfrog Layer
    • L2HMC Update
  • 4D SU(3)SU(3)SU(3) Model
    • HMC: 4D SU(3)SU(3)SU(3)
    • HMC: 4D SU(3)SU(3)SU(3)
  • Networks 4D SU(3)SU(3)SU(3)
  • Networks 4D SU(3)SU(3)SU(3)
    • PPP-Network (pt. 1)
    • PPP-Network (pt. 2)
  • Results: 2D U(1)U(1)U(1)
    • Interpretation
    • Interpretation
  • 4D SU(3)SU(3)SU(3) Results
    • 4D SU(3)SU(3)SU(3) Results: δUμν\delta U_{\mu\nu}δUμν​
    • 4D SU(3)SU(3)SU(3) Results: δUμν\delta U_{\mu\nu}δUμν​
  • Next Steps
    • Thank you!
    • Acknowledgements
    • References
  • Extras
    • Integrated Autocorrelation Time
    • Comparison
    • Plaquette analysis: xPx_{P}xP​
    • Loss Function
    • Networks 2D U(1)U(1)U(1)
    • vvv-Update
    • xxx-Update
    • Lattice Gauge Theory (2D U(1)U(1)U(1))
    • Annealing Schedule
    • Networks 2D U(1)U(1)U(1)
    • Toy Example: GMM ∈R2\in \mathbb{R}^{2}∈R2
    • Physical Quantities
    • Extra
  • View source
  • Edit this page
  • Report an issue

Other Formats

  • RevealJS
  1. 🎙️ Talks
  2. MLMC: Machine Learning Monte Carlo

MLMC: Machine Learning Monte Carlo

Author
Affiliation

Sam Foreman

ALCF

Published

July 31, 2023

Modified

March 29, 2025

MLMC: Machine Learning Monte Carlo
for Lattice Gauge Theory

 

Sam Foreman
Xiao-Yong Jin, James C. Osborn
saforem2/{lattice23, l2hmc-qcd}

2023-07-31 @ Lattice 2023

Overview

  1. Background: {MCMC,HMC}
    • Leapfrog Integrator
    • Issues with HMC
    • Can we do better?
  2. L2HMC: Generalizing MD
    • 4D SU(3)SU(3)SU(3) Model
    • Results
  3. References
  4. Extras

Markov Chain Monte Carlo (MCMC)

Note🎯 Goal

Generate independent samples {xi}\{x_{i}\}{xi​}, such that1 {xi}∼p(x)∝e−S(x)\{x_{i}\} \sim p(x) \propto e^{-S(x)}{xi​}∼p(x)∝e−S(x) where S(x)S(x)S(x) is the action (or potential energy)

  • Want to calculate observables O\mathcal{O}O:
    ⟨O⟩∝∫[Dx]O(x) p(x)\left\langle \mathcal{O}\right\rangle \propto \int \left[\mathcal{D}x\right]\hspace{4pt} {\mathcal{O}(x)\, p(x)}⟨O⟩∝∫[Dx]O(x)p(x)

If these were independent, we could approximate: ⟨O⟩≃1N∑n=1NO(xn)\left\langle\mathcal{O}\right\rangle \simeq \frac{1}{N}\sum^{N}_{n=1}\mathcal{O}(x_{n})⟨O⟩≃N1​∑n=1N​O(xn​)
σO2=1NVar[O(x)]⟹σO∝1N\sigma_{\mathcal{O}}^{2} = \frac{1}{N}\mathrm{Var}{\left[\mathcal{O} (x) \right]}\Longrightarrow \sigma_{\mathcal{O}} \propto \frac{1}{\sqrt{N}}σO2​=N1​Var[O(x)]⟹σO​∝N​1​

saforem2/lattice23

Markov Chain Monte Carlo (MCMC)

Note🎯 Goal

Generate independent samples {xi}\{x_{i}\}{xi​}, such that2 {xi}∼p(x)∝e−S(x)\{x_{i}\} \sim p(x) \propto e^{-S(x)}{xi​}∼p(x)∝e−S(x) where S(x)S(x)S(x) is the action (or potential energy)

  • Want to calculate observables O\mathcal{O}O:
    ⟨O⟩∝∫[Dx]O(x) p(x)\left\langle \mathcal{O}\right\rangle \propto \int \left[\mathcal{D}x\right]\hspace{4pt} {\mathcal{O}(x)\, p(x)}⟨O⟩∝∫[Dx]O(x)p(x)

Instead, nearby configs are correlated, and we incur a factor of τintO\textcolor{#FF5252}{\tau^{\mathcal{O}}_{\mathrm{int}}}τintO​: σO2=τintONVar[O(x)]\sigma_{\mathcal{O}}^{2} = \frac{\textcolor{#FF5252}{\tau^{\mathcal{O}}_{\mathrm{int}}}}{N}\mathrm{Var}{\left[\mathcal{O} (x) \right]}σO2​=NτintO​​Var[O(x)]

saforem2/lattice23

Hamiltonian Monte Carlo (HMC)

  • Want to (sequentially) construct a chain of states: x0→x1→xi→⋯→xNx_{0} \rightarrow x_{1} \rightarrow x_{i} \rightarrow \cdots \rightarrow x_{N}\hspace{10pt}x0​→x1​→xi​→⋯→xN​

    such that, as N→∞N \rightarrow \inftyN→∞: {xi,xi+1,xi+2,⋯ ,xN}→N→∞p(x)∝e−S(x)\left\{x_{i}, x_{i+1}, x_{i+2}, \cdots, x_{N} \right\} \xrightarrow[]{N\rightarrow\infty} p(x) \propto e^{-S(x)}{xi​,xi+1​,xi+2​,⋯,xN​}N→∞​p(x)∝e−S(x)

Tip🪄 Trick
  • Introduce fictitious momentum v∼N(0,1)v \sim \mathcal{N}(0, \mathbb{1})v∼N(0,1)
    • Normally distributed independent of xxx, i.e. p(x,v)=p(x) p(v)∝e−S(x)e−12vTv=e−[S(x)+12vTv]=e−H(x,v)\begin{align*} p(x, v) &\textcolor{#02b875}{=} p(x)\,p(v) \propto e^{-S{(x)}} e^{-\frac{1}{2} v^{T}v} = e^{-\left[S(x) + \frac{1}{2} v^{T}{v}\right]} \textcolor{#02b875}{=} e^{-H(x, v)} \end{align*}p(x,v)​=p(x)p(v)∝e−S(x)e−21​vTv=e−[S(x)+21​vTv]=e−H(x,v)​

Hamiltonian Monte Carlo (HMC)

  • Idea: Evolve the (x˙,v˙)(\dot{x}, \dot{v})(x˙,v˙) system to get new states {xi}\{x_{i}\}{xi​}❗

  • Write the joint distribution p(x,v)p(x, v)p(x,v): p(x,v)∝e−S[x]e−12vTv=e−H(x,v) p(x, v) \propto e^{-S[x]} e^{-\frac{1}{2}v^{T} v} = e^{-H(x, v)} p(x,v)∝e−S[x]e−21​vTv=e−H(x,v)

Tip🔋 Hamiltonian Dynamics

H=S[x]+12vTv⟹H = S[x] + \frac{1}{2} v^{T} v \LongrightarrowH=S[x]+21​vTv⟹ x˙=+∂vH,  v˙=−∂xH\dot{x} = +\partial_{v} H, \,\,\dot{v} = -\partial_{x} Hx˙=+∂v​H,v˙=−∂x​H

Figure 1: Overview of HMC algorithm

Leapfrog Integrator (HMC)

Tip🔋 Hamiltonian Dynamics

(x˙,v˙)=(∂vH,−∂xH)\left(\dot{x}, \dot{v}\right) = \left(\partial_{v} H, -\partial_{x} H\right)(x˙,v˙)=(∂v​H,−∂x​H)

Note🐸 Leapfrog Step

input  (x,v)→(x′,v′) \,\left(x, v\right) \rightarrow \left(x', v'\right)\,(x,v)→(x′,v′) output

v~:=Γ(x,v)=v−ε2∂xS(x)x′:=Λ(x,v~) =x+ε v~v′:=Γ(x′,v~)=v~−ε2∂xS(x′)\begin{align*} \tilde{v} &:= \textcolor{#F06292}{\Gamma}(x, v)\hspace{2.2pt} = v - \frac{\varepsilon}{2} \partial_{x} S(x) \\ x' &:= \textcolor{#FD971F}{\Lambda}(x, \tilde{v}) \, = x + \varepsilon \, \tilde{v} \\ v' &:= \textcolor{#F06292}{\Gamma}(x', \tilde{v}) = \tilde{v} - \frac{\varepsilon}{2} \partial_{x} S(x') \end{align*}v~x′v′​:=Γ(x,v)=v−2ε​∂x​S(x):=Λ(x,v~)=x+εv~:=Γ(x′,v~)=v~−2ε​∂x​S(x′)​

Warning⚠️ Warning!
  • Resample v0∼N(0,1)v_{0} \sim \mathcal{N}(0, \mathbb{1})v0​∼N(0,1)
    at the beginning of each trajectory

Note: ∂xS(x)\partial_{x} S(x)∂x​S(x) is the force

HMC Update

  • We build a trajectory of NLFN_{\mathrm{LF}}NLF​ leapfrog steps3 (x0,v0)→(x1,v1)→⋯→(x′,v′)\begin{equation*} (x_{0}, v_{0})% \rightarrow (x_{1}, v_{1})\rightarrow \cdots% \rightarrow (x', v') \end{equation*}(x0​,v0​)→(x1​,v1​)→⋯→(x′,v′)​

  • And propose x′x'x′ as the next state in our chain

Γ:(x,v)→v′:=v−ε2∂xS(x)Λ:(x,v)→x′:=x+εv\begin{align*} \textcolor{#F06292}{\Gamma}: (x, v) \textcolor{#F06292}{\rightarrow} v' &:= v - \frac{\varepsilon}{2} \partial_{x} S(x) \\ \textcolor{#FD971F}{\Lambda}: (x, v) \textcolor{#FD971F}{\rightarrow} x' &:= x + \varepsilon v \end{align*}Γ:(x,v)→v′Λ:(x,v)→x′​:=v−2ε​∂x​S(x):=x+εv​

  • We then accept / reject x′x'x′ using Metropolis-Hastings criteria,
    A(x′∣x)=min⁡{1,p(x′)p(x)∣∂x′∂x∣}A(x'|x) = \min\left\{1, \frac{p(x')}{p(x)}\left|\frac{\partial x'}{\partial x}\right|\right\}A(x′∣x)=min{1,p(x)p(x′)​​∂x∂x′​​}

HMC Demo

Figure 2: HMC Demo

Issues with HMC

  • What do we want in a good sampler?
    • Fast mixing (small autocorrelations)
    • Fast burn-in (quick convergence)
  • Problems with HMC:
    • Energy levels selected randomly →\rightarrow→ slow mixing
    • Cannot easily traverse low-density zones →\rightarrow→ slow convergence

HMC Samples with ε=0.25\varepsilon=0.25ε=0.25

HMC Samples with ε=0.5\varepsilon=0.5ε=0.5
Figure 3: HMC Samples generated with varying step sizes ε\varepsilonε

Topological Freezing

Topological Charge: Q=12π∑P⌊xP⌋∈ZQ = \frac{1}{2\pi}\sum_{P}\left\lfloor x_{P}\right\rfloor \in \mathbb{Z}Q=2π1​P∑​⌊xP​⌋∈Z

note: ⌊xP⌋=xP−2π⌊xP+π2π⌋\left\lfloor x_{P} \right\rfloor = x_{P} - 2\pi \left\lfloor\frac{x_{P} + \pi}{2\pi}\right\rfloor⌊xP​⌋=xP​−2π⌊2πxP​+π​⌋

Important⏳ Critical Slowing Down
  • QQQ gets stuck!
    • as β⟶∞\beta\longrightarrow \inftyβ⟶∞:
      • Q⟶const.Q \longrightarrow \text{const.}Q⟶const.
      • δQ=(Q∗−Q)→0⟹\delta Q = \left(Q^{\ast} - Q\right) \rightarrow 0 \textcolor{#FF5252}{\Longrightarrow}δQ=(Q∗−Q)→0⟹
    • # configs required to estimate errors
      grows exponentially: τintQ⟶∞\tau_{\mathrm{int}}^{Q} \longrightarrow \inftyτintQ​⟶∞

Note \delta Q \rightarrow 0 at increasing \beta

Note δQ→0\delta Q \rightarrow 0δQ→0 at increasing β\betaβ

Can we do better?

  • Introduce two (invertible NNs) vNet and xNet4:
    • vNet: (x,F)⟶(sv, tv, qv)(x, F) \longrightarrow \left(s_{v},\, t_{v},\, q_{v}\right)(x,F)⟶(sv​,tv​,qv​)
    • xNet: (x,v)⟶(sx, tx, qx)(x, v) \longrightarrow \left(s_{x},\, t_{x},\, q_{x}\right)(x,v)⟶(sx​,tx​,qx​)

 

  • Use these (s,t,q)(s, t, q)(s,t,q) in the generalized MD update:
    • Γθ±\Gamma_{\theta}^{\pm}Γθ±​ :(x,v)→sv,tv,qv(x,v′): ({x}, \textcolor{#07B875}{v}) \xrightarrow[]{\textcolor{#F06292}{s_{v}, t_{v}, q_{v}}} (x, \textcolor{#07B875}{v'}):(x,v)sv​,tv​,qv​​(x,v′)
    • Λθ±\Lambda_{\theta}^{\pm}Λθ±​ :(x,v)→sx,tx,qx(x′,v): (\textcolor{#AE81FF}{x}, v) \xrightarrow[]{\textcolor{#FD971F}{s_{x}, t_{x}, q_{x}}} (\textcolor{#AE81FF}{x'}, v):(x,v)sx​,tx​,qx​​(x′,v)
Figure 4: Generalized MD update where Λθ±\Lambda_{\theta}^{\pm}Λθ±​, Γθ±\Gamma_{\theta}^{\pm}Γθ±​ are invertible NNs

L2HMC: Generalizing the MD Update

  • Introduce d∼U(±)d \sim \mathcal{U}(\pm)d∼U(±) to determine the direction of our update

    1. v′=\textcolor{#07B875}{v'} =v′= Γ±\Gamma^{\pm}Γ±(x,v)({x}, \textcolor{#07B875}{v})(x,v) \hspace{46pt} update vvv

    2. x′=\textcolor{#AE81FF}{x'} =x′= xBx_{B}xB​ + \,+\,+Λ±\Lambda^{\pm}Λ±(((xAx_{A}xA​,v′), {v'}),v′) \hspace{10pt} update first half: xAx_{A}xA​

    3. x′′=\textcolor{#AE81FF}{x''} =x′′= xA′x'_{A}xA′​ + \,+\,+Λ±\Lambda^{\pm}Λ±(((xB′x'_{B}xB′​,v′), {v'}),v′) \hspace{8pt} update other half: xBx_{B}xB​

    4. v′′=\textcolor{#07B875}{v''} =v′′= Γ±\Gamma^{\pm}Γ±(x′′,v′)({x''}, \textcolor{#07B875}{v'})(x′′,v′) \hspace{36pt} update vvv

  • Resample both v∼N(0,1)v\sim \mathcal{N}(0, 1)v∼N(0,1), and d∼U(±)d \sim \mathcal{U}(\pm)d∼U(±) at the beginning of each trajectory
    • To ensure ergodicity + reversibility, we split the xxx update into sequential (complementary) updates
  • Introduce directional variable d∼U(±)d \sim \mathcal{U}(\pm)d∼U(±), resampled at the beginning of each trajectory:
    • Note that (Γ+)−1=Γ−\left(\Gamma^{+}\right)^{-1} = \Gamma^{-}(Γ+)−1=Γ−, i.e. Γ+[Γ−(x,v)]=Γ−[Γ+(x,v)]=(x,v)\Gamma^{+}\left[\Gamma^{-}(x, v)\right] = \Gamma^{-}\left[\Gamma^{+}(x, v)\right] = (x, v)Γ+[Γ−(x,v)]=Γ−[Γ+(x,v)]=(x,v)
Figure 5: Generalized MD update with Λθ±\Lambda_{\theta}^{\pm}Λθ±​, Γθ±\Gamma_{\theta}^{\pm}Γθ±​ invertible NNs

L2HMC: Leapfrog Layer

L2HMC Update

Important👨‍💻 Algorithm
  1. input: xxx

    • Resample: v∼N(0,1)\textcolor{#07B875}{v} \sim \mathcal{N}(0, \mathbb{1})v∼N(0,1);   d∼U(±)\,\,{d\sim\mathcal{U}(\pm)}d∼U(±)
    • Construct initial state: ξ=(x,v,±)\textcolor{#939393}{\xi} =(\textcolor{#AE81FF}{x}, \textcolor{#07B875}{v}, {\pm})ξ=(x,v,±)
  2. forward: Generate proposal ξ′\xi'ξ′ by passing initial ξ\xiξ through NLFN_{\mathrm{LF}}NLF​ leapfrog layers
    ξ→LF layerξ1⟶⋯⟶ξNLF=ξ′:=(x′′,v′′)\textcolor{#939393} \xi \hspace{1pt}\xrightarrow[]{\tiny{\mathrm{LF} \text{ layer}}}\xi_{1} \longrightarrow\cdots \longrightarrow \xi_{N_{\mathrm{LF}}} = \textcolor{#f8f8f8}{\xi'} := (\textcolor{#AE81FF}{x''}, \textcolor{#07B875}{v''})ξLF layer​ξ1​⟶⋯⟶ξNLF​​=ξ′:=(x′′,v′′)

    • Accept / Reject: A(ξ′∣ξ)=min{1,π(ξ′)π(ξ)∣J(ξ′,ξ)∣}\begin{equation*} A({\textcolor{#f8f8f8}{\xi'}}|{\textcolor{#939393}{\xi}})= \mathrm{min}\left\{1, \frac{\pi(\textcolor{#f8f8f8}{\xi'})}{\pi(\textcolor{#939393}{\xi})} \left| \mathcal{J}\left(\textcolor{#f8f8f8}{\xi'},\textcolor{#939393}{\xi}\right)\right| \right\} \end{equation*}A(ξ′∣ξ)=min{1,π(ξ)π(ξ′)​∣J(ξ′,ξ)∣}​
  3. backward (if training):

    • Evaluate the loss function5 L←Lθ(ξ′,ξ)\mathcal{L}\gets \mathcal{L}_{\theta}(\textcolor{#f8f8f8}{\xi'}, \textcolor{#939393}{\xi})L←Lθ​(ξ′,ξ) and backprop
  4. return: xi+1\textcolor{#AE81FF}{x}_{i+1}xi+1​
    Evaluate MH criteria (1)(1)(1) and return accepted config, xi+1←{x′′ w/ prob A(ξ′′∣ξ)✅x w/ prob 1−A(ξ′′∣ξ)🚫\textcolor{#AE81FF}{{x}_{i+1}}\gets \begin{cases} \textcolor{#f8f8f8}{\textcolor{#AE81FF}{x''}} \small{\text{ w/ prob }} A(\textcolor{#f8f8f8}{\xi''}|\textcolor{#939393}{\xi}) \hspace{26pt} ✅ \\ \textcolor{#939393}{\textcolor{#AE81FF}{x}} \hspace{5pt}\small{\text{ w/ prob }} 1 - A(\textcolor{#f8f8f8}{\xi''}|{\textcolor{#939393}{\xi}}) \hspace{10pt} 🚫 \end{cases}xi+1​←{x′′ w/ prob A(ξ′′∣ξ)✅x w/ prob 1−A(ξ′′∣ξ)🚫​

Figure 6: Leapfrog Layer used in generalized MD update

4D SU(3)SU(3)SU(3) Model

Note🔗 Link Variables
  • Write link variables Uμ(x)∈SU(3)U_{\mu}(x) \in SU(3)Uμ​(x)∈SU(3):

    Uμ(x)=exp[i ωμk(x)λk]=eiQ,withQ∈su(3) \begin{align*} U_{\mu}(x) &= \mathrm{exp}\left[{i\, \textcolor{#AE81FF}{\omega^{k}_{\mu}(x)} \lambda^{k}}\right]\\ &= e^{i \textcolor{#AE81FF}{Q}},\quad \text{with} \quad \textcolor{#AE81FF}{Q} \in \mathfrak{su}(3) \end{align*}Uμ​(x)​=exp[iωμk​(x)λk]=eiQ,withQ∈su(3)​

    where ωμk(x)\omega^{k}_{\mu}(x)ωμk​(x) ∈R\in \mathbb{R}∈R, and λk\lambda^{k}λk are the generators of SU(3)SU(3)SU(3)

Tip🏃‍♂️‍➡️ Conjugate Momenta
  • Introduce Pμ(x)=Pμk(x)λkP_{\mu}(x) = P^{k}_{\mu}(x) \lambda^{k}Pμ​(x)=Pμk​(x)λk conjugate to ωμk(x)\omega^{k}_{\mu}(x)ωμk​(x)
Important🟥 Wilson Action

SG=−β6∑Tr[Uμν(x)+Uμν†(x)] S_{G} = -\frac{\beta}{6} \sum \mathrm{Tr}\left[U_{\mu\nu}(x) + U^{\dagger}_{\mu\nu}(x)\right] SG​=−6β​∑Tr[Uμν​(x)+Uμν†​(x)]

where Uμν(x)=Uμ(x)Uν(x+μ^)Uμ†(x+ν^)Uν†(x)U_{\mu\nu}(x) = U_{\mu}(x) U_{\nu}(x+\hat{\mu}) U^{\dagger}_{\mu}(x+\hat{\nu}) U^{\dagger}_{\nu}(x)Uμν​(x)=Uμ​(x)Uν​(x+μ^​)Uμ†​(x+ν^)Uν†​(x)

Figure 7: Illustration of the lattice

HMC: 4D SU(3)SU(3)SU(3)

Hamiltonian: H[P,U]=12P2+S[U]⟹H[P, U] = \frac{1}{2} P^{2} + S[U] \LongrightarrowH[P,U]=21​P2+S[U]⟹

  • UUU update: dωkdt=∂H∂Pk\frac{d\omega^{k}}{dt} = \frac{\partial H}{\partial P^{k}}dtdωk​=∂Pk∂H​ dωkdtλk=Pkλk⟹dQdt=P\frac{d\omega^{k}}{dt}\lambda^{k} = P^{k}\lambda^{k} \Longrightarrow \frac{dQ}{dt} = Pdtdωk​λk=Pkλk⟹dtdQ​=P Q(ε)=Q(0)+εP(0)⟹−i log⁡U(ε)=−i log⁡U(0)+εP(0)U(ε)=ei εP(0)U(0)⟹Λ:  U⟶U′:=eiεP′U\begin{align*} Q(\textcolor{#FFEE58}{\varepsilon}) &= Q(0) + \textcolor{#FFEE58}{\varepsilon} P(0)\Longrightarrow\\ -i\, \log U(\textcolor{#FFEE58}{\varepsilon}) &= -i\, \log U(0) + \textcolor{#FFEE58}{\varepsilon} P(0) \\ U(\textcolor{#FFEE58}{\varepsilon}) &= e^{i\,\textcolor{#FFEE58}{\varepsilon} P(0)} U(0)\Longrightarrow \\ &\hspace{1pt}\\ \textcolor{#FD971F}{\Lambda}:\,\, U \longrightarrow U' &:= e^{i\varepsilon P'} U \end{align*}Q(ε)−ilogU(ε)U(ε)Λ:U⟶U′​=Q(0)+εP(0)⟹=−ilogU(0)+εP(0)=eiεP(0)U(0)⟹:=eiεP′U​

ε\textcolor{#FFEE58}{\varepsilon}ε is the step size

  • PPP update: dPkdt=−∂H∂ωk\frac{dP^{k}}{dt} = - \frac{\partial H}{\partial \omega^{k}}dtdPk​=−∂ωk∂H​ dPkdt=−∂H∂ωk=−∂H∂Q=−dSdQ⟹\frac{dP^{k}}{dt} = - \frac{\partial H}{\partial \omega^{k}} = -\frac{\partial H}{\partial Q} = -\frac{dS}{dQ}\LongrightarrowdtdPk​=−∂ωk∂H​=−∂Q∂H​=−dQdS​⟹ P(ε)=P(0)−εdSdQ∣t=0=P(0)−ε F[U]Γ:  P⟶P′:=P−ε2F[U]\begin{align*} P(\textcolor{#FFEE58}{\varepsilon}) &= P(0) - \textcolor{#FFEE58}{\varepsilon} \left.\frac{dS}{dQ}\right|_{t=0} \\ &= P(0) - \textcolor{#FFEE58}{\varepsilon} \,\textcolor{#E599F7}{F[U]} \\ &\hspace{1pt}\\ \textcolor{#F06292}{\Gamma}:\,\, P \longrightarrow P' &:= P - \frac{\varepsilon}{2} F[U] \end{align*}P(ε)Γ:P⟶P′​=P(0)−εdQdS​​t=0​=P(0)−εF[U]:=P−2ε​F[U]​

F[U]\textcolor{#E599F7}{F[U]}F[U] is the force term

HMC: 4D SU(3)SU(3)SU(3)

  • Momentum Update: Γ:P⟶P′:=P−ε2F[U]\textcolor{#F06292}{\Gamma}: P \longrightarrow P' := P - \frac{\varepsilon}{2} F[U]Γ:P⟶P′:=P−2ε​F[U]

  • Link Update: Λ:U⟶U′:=eiεP′U\textcolor{#FD971F}{\Lambda}: U \longrightarrow U' := e^{i\varepsilon P'} U\quad\quadΛ:U⟶U′:=eiεP′U

  • We maintain a batch of Nb lattices, all updated in parallel

    • UUU.dtype = complex128
    • UUU.shape
      = [Nb, 4, Nt, Nx, Ny, Nz, 3, 3]

Networks 4D SU(3)SU(3)SU(3)

 

 

UUU-Network:

UNet: (U,P)⟶(sU, tU, qU)(U, P) \longrightarrow \left(s_{U},\, t_{U},\, q_{U}\right)(U,P)⟶(sU​,tU​,qU​)

 

PPP-Network:

PNet: (U,P)⟶(sP, tP, qP)(U, P) \longrightarrow \left(s_{P},\, t_{P},\, q_{P}\right)(U,P)⟶(sP​,tP​,qP​)

Networks 4D SU(3)SU(3)SU(3)

 

 

UUU-Network:

UNet: (U,P)⟶(sU, tU, qU)(U, P) \longrightarrow \left(s_{U},\, t_{U},\, q_{U}\right)(U,P)⟶(sU​,tU​,qU​)

 

PPP-Network:

PNet: (U,P)⟶(sP, tP, qP)(U, P) \longrightarrow \left(s_{P},\, t_{P},\, q_{P}\right)(U,P)⟶(sP​,tP​,qP​)

↑\uparrow↑
let’s look at this

PPP-Network (pt. 1)

  • input6: (U,F):=(eiQ,F)\hspace{7pt}\left(U, F\right) := (e^{i Q}, F)(U,F):=(eiQ,F) h0=σ(wQQ+wFF+b)h1=σ(w1h0+b1)⋮hn=σ(wn−1hn−2+bn)z:=σ(wnhn−1+bn)⟶\begin{align*} h_{0} &= \sigma\left( w_{Q} Q + w_{F} F + b \right) \\ h_{1} &= \sigma\left( w_{1} h_{0} + b_{1} \right) \\ &\vdots \\ h_{n} &= \sigma\left(w_{n-1} h_{n-2} + b_{n}\right) \\ \textcolor{#FF5252}{z} & := \sigma\left(w_{n} h_{n-1} + b_{n}\right) \longrightarrow \\ \end{align*}h0​h1​hn​z​=σ(wQ​Q+wF​F+b)=σ(w1​h0​+b1​)⋮=σ(wn−1​hn−2​+bn​):=σ(wn​hn−1​+bn​)⟶​
  • output7: (sP,tP,qP)\hspace{7pt} (s_{P}, t_{P}, q_{P})(sP​,tP​,qP​)

    • sP=λstanh⁡(wsz+bs)s_{P} = \lambda_{s} \tanh(w_s \textcolor{#FF5252}{z} + b_s)sP​=λs​tanh(ws​z+bs​)
    • tP=wtz+btt_{P} = w_{t} \textcolor{#FF5252}{z} + b_{t}tP​=wt​z+bt​
    • qP=λqtanh⁡(wqz+bq)q_{P} = \lambda_{q} \tanh(w_{q} \textcolor{#FF5252}{z} + b_{q})qP​=λq​tanh(wq​z+bq​)

PPP-Network (pt. 2)

  • Use (sP,tP,qP)(s_{P}, t_{P}, q_{P})(sP​,tP​,qP​) to update Γ±:(U,P)→(U,P±)\Gamma^{\pm}: (U, P) \rightarrow \left(U, P_{\pm}\right)Γ±:(U,P)→(U,P±​)8:

    • forward (d=+)(d = \textcolor{#FF5252}{+})(d=+): Γ+(U,P):=P+=P⋅eε2sP−ε2[F⋅eεqP+tP]\Gamma^{\textcolor{#FF5252}{+}}(U, P) := P_{\textcolor{#FF5252}{+}} = P \cdot e^{\frac{\varepsilon}{2} s_{P}} - \frac{\varepsilon}{2}\left[ F \cdot e^{\varepsilon q_{P}} + t_{P} \right]Γ+(U,P):=P+​=P⋅e2ε​sP​−2ε​[F⋅eεqP​+tP​]

    • backward (d=−)(d = \textcolor{#1A8FFF}{-})(d=−): Γ−(U,P):=P−=e−ε2sP{P+ε2[F⋅eεqP+tP]}\Gamma^{\textcolor{#1A8FFF}{-}}(U, P) := P_{\textcolor{#1A8FFF}{-}} = e^{-\frac{\varepsilon}{2} s_{P}} \left\{P + \frac{\varepsilon}{2}\left[ F \cdot e^{\varepsilon q_{P}} + t_{P} \right]\right\}Γ−(U,P):=P−​=e−2ε​sP​{P+2ε​[F⋅eεqP​+tP​]}

Results: 2D U(1)U(1)U(1)

Important📈 Improvement

We can measure the performance by comparing τint\tau_{\mathrm{int}}τint​ for the trained model vs. HMC.

Note: lower is better

Interpretation

Deviation in xPx_{P}xP​

Topological charge mixing

Artificial influx of energy

Figure 8: Illustration of how different observables evolve over a single L2HMC trajectory.

Interpretation

Average plaquette: ⟨xP⟩\langle x_{P}\rangle⟨xP​⟩ vs LF step

Average energy: H−∑log⁡∣J∣H - \sum\log|\mathcal{J}|H−∑log∣J∣
Figure 9: The trained model artifically increases the energy towards the middle of the trajectory, allowing the sampler to tunnel between isolated sectors.

4D SU(3)SU(3)SU(3) Results

  • Distribution of log⁡∣J∣\log|\mathcal{J}|log∣J∣ over all chains, at each leapfrog step, NLFN_{\mathrm{LF}}NLF​ (=0,1,…,8= 0, 1, \ldots, 8=0,1,…,8) during training:
Figure 10: 100 train iters
Figure 11: 500 train iters
Figure 12: 1000 train iters

4D SU(3)SU(3)SU(3) Results: δUμν\delta U_{\mu\nu}δUμν​

Figure 13: The difference in the average plaquette ∣δUμν∣2\left|\delta U_{\mu\nu}\right|^{2}∣δUμν​∣2 between the trained model and HMC

4D SU(3)SU(3)SU(3) Results: δUμν\delta U_{\mu\nu}δUμν​

Figure 14: The difference in the average plaquette ∣δUμν∣2\left|\delta U_{\mu\nu}\right|^{2}∣δUμν​∣2 between the trained model and HMC

Next Steps

  • Further code development

    • saforem2/l2hmc-qcd
  • Continue to use / test different network architectures

    • Gauge equivariant NNs for Uμ(x)U_{\mu}(x)Uμ​(x) update
  • Continue to test different loss functions for training

  • Scaling:

    • Lattice volume
    • Network size
    • Batch size
    • # of GPUs

Thank you!

 

samforeman.me

saforem2

@saforem2

[email protected]

Note🙏 Acknowledgements

This research used resources of the Argonne Leadership Computing Facility,
which is a DOE Office of Science User Facility supported under Contract DE-AC02-06CH11357.

hits l2hmc-qcd codefactor

arxiv arxiv

hydra pyTorch tensorflow Weights & Biases monitoring

Acknowledgements

  • Links:
    • Link to github
    • reach out!
  • References:
    • Link to slides
      • link to github with slides
    • (Foreman et al. 2022; Foreman, Jin, and Osborn 2022, 2021)
    • (Boyda et al. 2022; Shanahan et al. 2022)
  • Huge thank you to:
    • Yannick Meurice
    • Norman Christ
    • Akio Tomiya
    • Nobuyuki Matsumoto
    • Richard Brower
    • Luchang Jin
    • Chulwoo Jung
    • Peter Boyle
    • Taku Izubuchi
    • Denis Boyda
    • Dan Hackett
    • ECP-CSD group
    • ALCF Staff + Datascience Group

Links

  • saforem2/l2hmc-qcd

  • 📊 slides (Github: saforem2/lattice23)

References

  • Title Slide Background (worms) animation
    • Github: saforem2/grid-worms-animation
  • Link to HMC demo

References

(I don’t know why this is broken 🤷🏻‍♂️ )

Boyda, Denis et al. 2022. “Applications of Machine Learning to Lattice Quantum Field Theory.” In Snowmass 2021. https://arxiv.org/abs/2202.05838.
Foreman, Sam, Taku Izubuchi, Luchang Jin, Xiao-Yong Jin, James C. Osborn, and Akio Tomiya. 2022. “HMC with Normalizing Flows.” PoS LATTICE2021: 073. https://doi.org/10.22323/1.396.0073.
Foreman, Sam, Xiao-Yong Jin, and James C. Osborn. 2021. “Deep Learning Hamiltonian Monte Carlo.” In 9th International Conference on Learning Representations. https://arxiv.org/abs/2105.03418.
———. 2022. “LeapfrogLayers: A Trainable Framework for Effective Topological Sampling.” PoS LATTICE2021 (May): 508. https://doi.org/10.22323/1.396.0508.
Shanahan, Phiala et al. 2022. “Snowmass 2021 Computational Frontier CompF03 Topical Group Report: Machine Learning,” September. https://arxiv.org/abs/2209.07559.

Extras

Integrated Autocorrelation Time

Figure 15: Plot of the integrated autocorrelation time for both the trained model (colored) and HMC (greyscale).

Comparison

(a) Trained model
(b) Generic HMC
Figure 16: Comparison of ⟨δQ⟩=1N∑i=kNδQi\langle \delta Q\rangle = \frac{1}{N}\sum_{i=k}^{N} \delta Q_{i}⟨δQ⟩=N1​∑i=kN​δQi​ for the trained model Figure 16 (a) vs. HMC Figure 16 (b)

Plaquette analysis: xPx_{P}xP​

Deviation from V→∞V\rightarrow\inftyV→∞ limit, xP∗x_{P}^{\ast}xP∗​

Average ⟨xP⟩\langle x_{P}\rangle⟨xP​⟩, with xP∗x_{P}^{\ast}xP∗​ (dotted-lines)

Figure 17: Plot showing how average plaquette, ⟨xP⟩\left\langle x_{P}\right\rangle⟨xP​⟩ varies over a single trajectory for models trained at different β\betaβ, with varying trajectory lengths NLFN_{\mathrm{LF}}NLF​

Loss Function

  • Want to maximize the expected squared charge difference9: Lθ(ξ∗,ξ)=Ep(ξ)[−δQ2(ξ∗,ξ)⋅A(ξ∗∣ξ)]\begin{equation*} \mathcal{L}_{\theta}\left(\xi^{\ast}, \xi\right) = {\mathbb{E}_{p(\xi)}}\big[-\textcolor{#FA5252}{{\delta Q}}^{2} \left(\xi^{\ast}, \xi \right)\cdot A(\xi^{\ast}|\xi)\big] \end{equation*}Lθ​(ξ∗,ξ)=Ep(ξ)​[−δQ2(ξ∗,ξ)⋅A(ξ∗∣ξ)]​

  • Where:

    • δQ\delta QδQ is the tunneling rate: δQ(ξ∗,ξ)=∣Q∗−Q∣\begin{equation*} \textcolor{#FA5252}{\delta Q}(\xi^{\ast},\xi)=\left|Q^{\ast} - Q\right| \end{equation*}δQ(ξ∗,ξ)=∣Q∗−Q∣​

    • A(ξ∗∣ξ)A(\xi^{\ast}|\xi)A(ξ∗∣ξ) is the probability10 of accepting the proposal ξ∗\xi^{\ast}ξ∗: A(ξ∗∣ξ)=min(1,p(ξ∗)p(ξ)∣∂ξ∗∂ξT∣)\begin{equation*} A(\xi^{\ast}|\xi) = \mathrm{min}\left( 1, \frac{p(\xi^{\ast})}{p(\xi)}\left|\frac{\partial \xi^{\ast}}{\partial \xi^{T}}\right|\right) \end{equation*}A(ξ∗∣ξ)=min(1,p(ξ)p(ξ∗)​​∂ξT∂ξ∗​​)​

Networks 2D U(1)U(1)U(1)

  • Stack gauge links as shape(Uμ)\left(U_{\mu}\right)(Uμ​)=[Nb, 2, Nt, Nx] ∈C\in \mathbb{C}∈C

    xμ(n)≔[cos⁡(x),sin⁡(x)] x_{\mu}(n) ≔ \left[\cos(x), \sin(x)\right]xμ​(n):=[cos(x),sin(x)]

    with shape(xμ)\left(x_{\mu}\right)(xμ​)= [Nb, 2, Nt, Nx, 2] ∈R\in \mathbb{R}∈R

  • xxx-Network:

    • ψθ:(x,v)⟶(sx, tx, qx)\psi_{\theta}: (x, v) \longrightarrow \left(s_{x},\, t_{x},\, q_{x}\right)ψθ​:(x,v)⟶(sx​,tx​,qx​)
  • vvv-Network:

    • φθ:(x,v)⟶(sv, tv, qv)\varphi_{\theta}: (x, v) \longrightarrow \left(s_{v},\, t_{v},\, q_{v}\right)φθ​:(x,v)⟶(sv​,tv​,qv​) ⟵\hspace{2pt}\longleftarrow⟵ lets look at this

vvv-Update11

  • forward (d=+)(d = \textcolor{#FF5252}{+})(d=+):

Γ+:(x,v)→v′:=v⋅eε2sv−ε2[F⋅eεqv+tv]\Gamma^{\textcolor{#FF5252}{+}}: (x, v) \rightarrow v' := v \cdot e^{\frac{\varepsilon}{2} s_{v}} - \frac{\varepsilon}{2}\left[ F \cdot e^{\varepsilon q_{v}} + t_{v} \right]Γ+:(x,v)→v′:=v⋅e2ε​sv​−2ε​[F⋅eεqv​+tv​]

  • backward (d=−)(d = \textcolor{#1A8FFF}{-})(d=−):

Γ−:(x,v)→v′:=e−ε2sv{v+ε2[F⋅eεqv+tv]}\Gamma^{\textcolor{#1A8FFF}{-}}: (x, v) \rightarrow v' := e^{-\frac{\varepsilon}{2} s_{v}} \left\{v + \frac{\varepsilon}{2}\left[ F \cdot e^{\varepsilon q_{v}} + t_{v} \right]\right\}Γ−:(x,v)→v′:=e−2ε​sv​{v+2ε​[F⋅eεqv​+tv​]}

xxx-Update

  • forward (d=+)(d = \textcolor{#FF5252}{+})(d=+):

Λ+(x,v)=x⋅eε2sx−ε2[v⋅eεqx+tx]\Lambda^{\textcolor{#FF5252}{+}}(x, v) = x \cdot e^{\frac{\varepsilon}{2} s_{x}} - \frac{\varepsilon}{2}\left[ v \cdot e^{\varepsilon q_{x}} + t_{x} \right]Λ+(x,v)=x⋅e2ε​sx​−2ε​[v⋅eεqx​+tx​]

  • backward (d=−)(d = \textcolor{#1A8FFF}{-})(d=−):

Λ−(x,v)=e−ε2sx{x+ε2[v⋅eεqx+tx]}\Lambda^{\textcolor{#1A8FFF}{-}}(x, v) = e^{-\frac{\varepsilon}{2} s_{x}} \left\{x + \frac{\varepsilon}{2}\left[ v \cdot e^{\varepsilon q_{x}} + t_{x} \right]\right\}Λ−(x,v)=e−2ε​sx​{x+2ε​[v⋅eεqx​+tx​]}

Lattice Gauge Theory (2D U(1)U(1)U(1))

Note🔗 Link Variables

Uμ(n)=eixμ(n)∈C,whereU_{\mu}(n) = e^{i x_{\mu}(n)}\in \mathbb{C},\quad \text{where}\quadUμ​(n)=eixμ​(n)∈C,where xμ(n)∈[−π,π)x_{\mu}(n) \in [-\pi,\pi)xμ​(n)∈[−π,π)

Important🫸 Wilson Action

Sβ(x)=β∑Pcos⁡xP,S_{\beta}(x) = \beta\sum_{P} \cos \textcolor{#00CCFF}{x_{P}},Sβ​(x)=βP∑​cosxP​,

xP=[xμ(n)+xν(n+μ^)−xμ(n+ν^)−xν(n)]\textcolor{#00CCFF}{x_{P}} = \left[x_{\mu}(n) + x_{\nu}(n+\hat{\mu}) - x_{\mu}(n+\hat{\nu})-x_{\nu}(n)\right]xP​=[xμ​(n)+xν​(n+μ^​)−xμ​(n+ν^)−xν​(n)]

Note: xP\textcolor{#00CCFF}{x_{P}}xP​ is the product of links around 1×11\times 11×1 square, called a “plaquette”

2D Lattice

2D Lattice

Figure 18: Jupyter Notebook

Annealing Schedule

  • Introduce an annealing schedule during the training phase:

    {γt}t=0N={γ0,γ1,…,γN−1,γN}\left\{ \gamma_{t} \right\}_{t=0}^{N} = \left\{\gamma_{0}, \gamma_{1}, \ldots, \gamma_{N-1}, \gamma_{N} \right\}{γt​}t=0N​={γ0​,γ1​,…,γN−1​,γN​}

    where γ0<γ1<⋯<γN≡1\gamma_{0} < \gamma_{1} < \cdots < \gamma_{N} \equiv 1γ0​<γ1​<⋯<γN​≡1, and ∣γt+1−γt∣≪1\left|\gamma_{t+1} - \gamma_{t}\right| \ll 1∣γt+1​−γt​∣≪1

  • Note:

    • for ∣γt∣<1\left|\gamma_{t}\right| < 1∣γt​∣<1, this rescaling helps to reduce the height of the energy barriers ⟹\Longrightarrow⟹
    • easier for our sampler to explore previously inaccessible regions of the phase space

Networks 2D U(1)U(1)U(1)

  • Stack gauge links as shape(Uμ)\left(U_{\mu}\right)(Uμ​)=[Nb, 2, Nt, Nx] ∈C\in \mathbb{C}∈C

    xμ(n)≔[cos⁡(x),sin⁡(x)] x_{\mu}(n) ≔ \left[\cos(x), \sin(x)\right]xμ​(n):=[cos(x),sin(x)]

    with shape(xμ)\left(x_{\mu}\right)(xμ​)= [Nb, 2, Nt, Nx, 2] ∈R\in \mathbb{R}∈R

  • xxx-Network:

    • ψθ:(x,v)⟶(sx, tx, qx)\psi_{\theta}: (x, v) \longrightarrow \left(s_{x},\, t_{x},\, q_{x}\right)ψθ​:(x,v)⟶(sx​,tx​,qx​)
  • vvv-Network:

    • φθ:(x,v)⟶(sv, tv, qv)\varphi_{\theta}: (x, v) \longrightarrow \left(s_{v},\, t_{v},\, q_{v}\right)φθ​:(x,v)⟶(sv​,tv​,qv​)

Toy Example: GMM ∈R2\in \mathbb{R}^{2}∈R2

Figure 19

Physical Quantities

  • To estimate physical quantities, we:
    • Calculate physical observables at increasing spatial resolution
    • Perform extrapolation to continuum limit
Figure 20: Increasing the physical resolution (a→0a \rightarrow 0a→0) allows us to make predictions about numerical values of physical quantities in the continuum limit.

Extra

Footnotes

  1. Here, ∼\sim∼ means “is distributed according to”↩︎

  2. Here, ∼\sim∼ means “is distributed according to”↩︎

  3. We always start by resampling the momentum, v0∼N(0,1)v_{0} \sim \mathcal{N}(0, \mathbb{1})v0​∼N(0,1)↩︎

  4. L2HMC: (Foreman, Jin, and Osborn 2021, 2022)↩︎

  5. For simple x∈R2\mathbf{x} \in \mathbb{R}^{2}x∈R2 example, Lθ=A(ξ∗∣ξ)⋅(x∗−x)2\mathcal{L}_{\theta} = A(\xi^{\ast}|\xi)\cdot \left(\mathbf{x}^{\ast} - \mathbf{x}\right)^{2}Lθ​=A(ξ∗∣ξ)⋅(x∗−x)2↩︎

  6. σ(⋅)\sigma(\cdot)σ(⋅) denotes an activation function↩︎

  7. λs,λq∈R\lambda_{s}, \lambda_{q} \in \mathbb{R}λs​,λq​∈R, trainable parameters↩︎

  8. Note that (Γ+)−1=Γ−\left(\Gamma^{+}\right)^{-1} = \Gamma^{-}(Γ+)−1=Γ−, i.e. Γ+[Γ−(U,P)]=Γ−[Γ+(U,P)]=(U,P)\Gamma^{+}\left[\Gamma^{-}(U, P)\right] = \Gamma^{-}\left[\Gamma^{+}(U, P)\right] = (U, P)Γ+[Γ−(U,P)]=Γ−[Γ+(U,P)]=(U,P)↩︎

  9. Where ξ∗\xi^{\ast}ξ∗ is the proposed configuration (prior to Accept / Reject)↩︎

  10. And ∣∂ξ∗∂ξT∣\left|\frac{\partial \xi^{\ast}}{\partial \xi^{T}}\right|​∂ξT∂ξ∗​​ is the Jacobian of the transformation from ξ→ξ∗\xi \rightarrow \xi^{\ast}ξ→ξ∗↩︎

  11. Note that (Γ+)−1=Γ−\left(\Gamma^{+}\right)^{-1} = \Gamma^{-}(Γ+)−1=Γ−, i.e. Γ+[Γ−(x,v)]=Γ−[Γ+(x,v)]=(x,v)\Gamma^{+}\left[\Gamma^{-}(x, v)\right] = \Gamma^{-}\left[\Gamma^{+}(x, v)\right] = (x, v)Γ+[Γ−(x,v)]=Γ−[Γ+(x,v)]=(x,v)↩︎

Citation

For attribution, please cite this work as:
Foreman, Sam. 2023. “MLMC: Machine Learning Monte Carlo for Lattice Gauge Theory.” Https://Indico.fnal.gov/Event/57249/Contributions/271305/. Presentation at the 2023 International Symposium on Lattice Field Theory, July 31. https://saforem2.github.io/lattice23.
AuroraGPT
Training LLMs at Scale
Source Code
---
title: "MLMC: Machine Learning Monte Carlo"
location: Lattice 2023 (Fermilab)
location-url: "https://indico.fnal.gov/event/57249/contributions/271305/"
# location: "[Lattice 2023](https://indico.fnal.gov/event/57249/contributions/271305/)"
image: "https://github.com/saforem2/lattice23/blob/main/assets/thumbnail.png?raw=true"
date: 2023-07-31
categories:
    - ai4science
    - MCMC
    - Lattice QCD
# date-modified: last-modified
title-block-categories: false
number-sections: false
bibliography: references.bib
appendix-cite-as: display
favicon: "./assets/favicon.svg"
callout-style: simple
twitter-card:
  image: "https://github.com/saforem2/lattice23/blob/main/assets/thumbnail.png?raw=true"
  site: "saforem2"
  creator: "saforem2"
citation:
   author: Sam Foreman
   type: speech
   genre: "Presentation at the 2023 International Symposium on Lattice Field Theory"
   container-title: https://indico.fnal.gov/event/57249/contributions/271305/
   title: "MLMC: Machine Learning Monte Carlo for Lattice Gauge Theory"
   url: https://saforem2.github.io/lattice23
   abstract: | 
     We present a trainable framework for efficiently generating gauge
     configurations, and discuss ongoing work in this direction. In particular, we
     consider the problem of sampling configurations from a 4D 𝑆𝑈(3) lattice gauge
     theory, and consider a generalized leapfrog integrator in the molecular
     dynamics update that can be trained to improve sampling efficiency.
format:
  html:
    # reference-location: section
    # toc-location: right
    page-layout: full
    # grid:
    #   body-width: 800px
  revealjs:
    slides-url: https://samforeman.me/talks/lattice23/slides.html
    # template-partials:
    #   - ./title-slide.html
      # - ./title-fancy/title-slide.html
      # - ./title_slide_template.html
      # - ../../title-slide.html
    title-slide-attributes:
      data-background-iframe: https://saforem2.github.io/grid-worms-animation/
      data-background-size: contain
      data-background-color: "#1c1c1c"
      background-color: "#1c1c1c"
    title-block-style: none
    slide-number: c
    title-slide-style: default
    chalkboard:
      buttons: false
    auto-animate: true
    reference-location: section
    touch: true
    pause: false
    footnotes-hover: true
    citations-hover: true
    preview-links: true
    controls-tutorial: true
    controls: false
    logo: "https://raw.githubusercontent.com/saforem2/anl-job-talk/main/docs/assets/anl.svg"
    history: false
    # theme: [css/dark.scss]
    callout-style: simple
    # css: [css/default.css, css/callouts.css]
    fig-align: center
    css:
      # - css/default.css
      - ../../css/custom.css
    theme:
      # - light:
      - white
      - ../../css/title-slide-template.scss
      - ../../css/reveal/reveal.scss
      - ../../css/common.scss
      - ../../css/dark.scss
      - ../../css/syntax-dark.scss
      - ../../css/callout-cards.scss
    # callout-style: simple
    # css:
    #   # - css/default.css
    #   - ../../css/custom.css
    # theme:
    #   # - light:
    #   # - css/dark.scss
    #   - ../../css/title-slide-template.scss
    #   - ../../css/reveal/reveal.scss
    #   - ../../css/common.scss
    #   - ../../css/dark.scss
    #   - ../../css/syntax-dark.scss
    #   - ../../css/callout-cards.scss
    # css:
    #   # - css/default.css
    #   - ../../css/custom.css
    # theme:
    #   # - black
    #   # - ./title-fancy/title-slide-template.scss
    #   - ../../css/reveal/reveal.scss
    #   - ../../css/common.scss
    #   - ../../css/dark.scss
    #   - ../../css/syntax-dark.scss
    #   - ../../css/callout-cards.scss
    #   - css/dark.scss
    self-contained: false
    embed-resources: false
    self-contained-math: false
    center: true
    highlight-style: "atom-one"
    default-image-extension: svg
    code-line-numbers: true
    data-background-color: "#1c1c1c"
    background-color: "#1c1c1c"
    code-overflow: scroll
    html-math-method: katex
    output-file: "slides.html"
    mermaid:
      theme: neutral
  # gfm:
  #   output-file: "lattice23.md"
---

# {.title-slide background-color="#1c1c1c" background-iframe="https://saforem2.github.io/grid-worms-animation/" loading="lazy"}

::: {style="background-color: rgba(22,22,22,0.75); border-radius: 10px; text-align:center; padding: 0px; padding-left: 1.5em; padding-right: 1.5em; max-width: min-content; min-width: max-content; margin-left: auto; margin-right: auto; padding-top: 0.2em; padding-bottom: 0.2em; line-height: 1.5em!important;"}
<span style="color:#939393; font-size:1.5em; font-weight: bold;">MLMC: Machine Learning Monte Carlo</span>  
<span style="color:#777777; font-size:1.2em; font-weight: bold;">for Lattice Gauge Theory</span>  
[<br>&nbsp;]{style="padding-bottom: 0.5rem;"}  
[{{< fa solid home >}}](https://samforeman.me) Sam Foreman  
[Xiao-Yong Jin, James C. Osborn]{.dim-text style="font-size:0.8em;"}  
[[[{{< fa brands github >}} `saforem2/`](https://github.com/saforem2/)]{style="border-bottom: 0.5px solid #00ccff;"}`{`[[`lattice23`](https://github.com/saforem2/lattice23)]{style="border-bottom: 0.5px solid #00ccff;"}, [[`l2hmc-qcd`](https://github.com/saforem2/l2hmc-qcd)]{style="border-bottom: 0.5px solid #00ccff;"}`}`]{style="font-size:0.8em;"}
:::

::: footer
[2023-07-31 @ [Lattice 2023](https://indico.fnal.gov/event/57249/contributions/271305/)]{.dim-text style="text-align:left;'}
:::

# Overview {background-color="#1c1c1c"}

1. [Background: `{MCMC,HMC}`](#markov-chain-monte-carlo-mcmc-centeredslide)
    - [Leapfrog Integrator](#leapfrog-integrator-hmc-centeredslide)
    - [Issues with HMC](#sec-issues-with-hmc)
    - [Can we do better?](#sec-can-we-do-better)
2. [L2HMC: Generalizing MD](#sec-l2hmc)
    - [4D $SU(3)$ Model](#sec-su3)
    - [Results](#sec-results)
3. [References](#sec-references)
4. [Extras](#sec-extras)


# Markov Chain Monte Carlo (MCMC) {.centeredslide background-color="#1c1c1c"}

:::: {.columns}

::: {.column width="50%"}

::: {.callout-note collapse=false icon=false title="🎯 Goal" style="text-align:left;!important; width: 100%!important;"}
Generate **independent** samples $\{x_{i}\}$, such that[^notation1]
$$\{x_{i}\} \sim p(x) \propto e^{-S(x)}$$
where $S(x)$ is the _action_ (or potential energy)
:::

- Want to calculate observables $\mathcal{O}$:  
  $\left\langle \mathcal{O}\right\rangle \propto \int \left[\mathcal{D}x\right]\hspace{4pt} {\mathcal{O}(x)\, p(x)}$

:::

::: {.column width="49%"}
![](https://raw.githubusercontent.com/saforem2/deep-fridays/main/assets/normal_distribution.dark.svg)
:::

::::

If these were <span style="color:#00CCFF;">independent</span>, we could approximate:
  $\left\langle\mathcal{O}\right\rangle \simeq \frac{1}{N}\sum^{N}_{n=1}\mathcal{O}(x_{n})$  
  $$\sigma_{\mathcal{O}}^{2} = \frac{1}{N}\mathrm{Var}{\left[\mathcal{O} (x)
  \right]}\Longrightarrow \sigma_{\mathcal{O}} \propto \frac{1}{\sqrt{N}}$$

[^notation1]: Here, $\sim$ means "is distributed according to"

::: footer
[{{< fa brands github >}} `saforem2/lattice23`](https://saforem2.github.io/lattice23)
:::

# Markov Chain Monte Carlo (MCMC) {.centeredslide background-color="#1c1c1c"}

:::: {.columns}

::: {.column width="50%"}

::: {.callout-note collapse=false icon=false title="🎯 Goal" style="text-align:left;!important; width: 100%!important;"}
Generate **independent** samples $\{x_{i}\}$, such that[^notation2]
$$\{x_{i}\} \sim p(x) \propto e^{-S(x)}$$
where $S(x)$ is the _action_ (or potential energy)
:::

- Want to calculate observables $\mathcal{O}$:  
  $\left\langle \mathcal{O}\right\rangle \propto \int \left[\mathcal{D}x\right]\hspace{4pt} {\mathcal{O}(x)\, p(x)}$

:::

::: {.column width="49%"}
![](https://raw.githubusercontent.com/saforem2/deep-fridays/main/assets/normal_distribution.dark.svg)
:::

::::

Instead, nearby configs are [correlated]{.red-text}, and we incur a factor of
$\textcolor{#FF5252}{\tau^{\mathcal{O}}_{\mathrm{int}}}$:
  $$\sigma_{\mathcal{O}}^{2} =
  \frac{\textcolor{#FF5252}{\tau^{\mathcal{O}}_{\mathrm{int}}}}{N}\mathrm{Var}{\left[\mathcal{O}
  (x) \right]}$$

[^notation2]: Here, $\sim$ means "is distributed according to"

::: footer
[{{< fa brands github >}} `saforem2/lattice23`](https://github.com/saforem2/lattice23)
:::

# Hamiltonian Monte Carlo (HMC) {.center background-color="#1c1c1c"}

- Want to (sequentially) construct a chain of states:
  $$x_{0} \rightarrow x_{1} \rightarrow x_{i} \rightarrow \cdots \rightarrow x_{N}\hspace{10pt}$$

  such that, as $N \rightarrow \infty$:
  $$\left\{x_{i}, x_{i+1}, x_{i+2}, \cdots, x_{N} \right\} \xrightarrow[]{N\rightarrow\infty} p(x)
  \propto e^{-S(x)}$$

::: {.callout-tip icon=false collapse=false title="🪄 Trick" style="display:inline!important; width: 100%!important;"}
  - Introduce [fictitious]{.green-text} momentum $v \sim \mathcal{N}(0, \mathbb{1})$
    - Normally distributed **independent** of $x$, i.e.
  $$\begin{align*}
    p(x, v) &\textcolor{#02b875}{=} p(x)\,p(v) \propto e^{-S{(x)}} e^{-\frac{1}{2} v^{T}v}
    = e^{-\left[S(x) + \frac{1}{2} v^{T}{v}\right]}
    \textcolor{#02b875}{=} e^{-H(x, v)}
  \end{align*}$$
:::

## Hamiltonian Monte Carlo (HMC) {.centeredslide background-color="#1c1c1c"}

:::: {.columns}

::: {.column width="55%"}

- [**Idea**]{.green-text}: Evolve the $(\dot{x}, \dot{v})$ system to get new states $\{x_{i}\}$❗

- Write the **joint distribution** $p(x, v)$:
  $$
  p(x, v) \propto e^{-S[x]} e^{-\frac{1}{2}v^{T} v} = e^{-H(x, v)}
  $$
:::

::: {.column width="45%"}

::: {.callout-tip collapse=false icon=false title="🔋 Hamiltonian Dynamics" style="width:100%!important;"}
$H = S[x] + \frac{1}{2} v^{T} v \Longrightarrow$
$$\dot{x} = +\partial_{v} H,
\,\,\dot{v} = -\partial_{x} H$$
:::

:::

::::

::: {#fig-hmc-traj}

![](https://raw.githubusercontent.com/saforem2/deep-fridays/main/assets/hmc1.svg){.r-stretch}

Overview of HMC algorithm
:::

## Leapfrog Integrator (HMC) {#sec-leapfrog background-color="#1c1c1c"}

:::: {.columns style="font-size: 0.9em;" height="100%"}

::: {.column width="40%"}

::: {.callout-tip collapse=false icon=false title="🔋 Hamiltonian Dynamics" style="width:100%!important;"}
$\left(\dot{x}, \dot{v}\right) = \left(\partial_{v} H, -\partial_{x} H\right)$
:::

::: {.callout-note collapse=false icon=false title="🐸 Leapfrog Step" style="width:100%!important;"}
`input` $\,\left(x, v\right) \rightarrow \left(x', v'\right)\,$ `output`

$$\begin{align*}
\tilde{v} &:= \textcolor{#F06292}{\Gamma}(x, v)\hspace{2.2pt} = v - \frac{\varepsilon}{2} \partial_{x} S(x) \\
x' &:= \textcolor{#FD971F}{\Lambda}(x, \tilde{v}) \, =  x + \varepsilon \, \tilde{v} \\
v' &:= \textcolor{#F06292}{\Gamma}(x', \tilde{v}) = \tilde{v} - \frac{\varepsilon}{2} \partial_{x} S(x')
\end{align*}$$

:::

::: {.callout-warning collapse=false icon=false title="⚠️ Warning!" style="width:100%!important;"}
- Resample $v_{0} \sim \mathcal{N}(0, \mathbb{1})$  
at the [beginning]{.yellow-text} of each trajectory
:::

::: {style="font-size:0.8em; margin-left:13%;"}
[**Note**: $\partial_{x} S(x)$ is the _force_]{.dim-text}
:::

:::

::: {.column width="55%" style="text-align:left;"}

![](./assets/hmc1/hmc-update-light.svg){width=60%}

:::

::::

## HMC Update {background-color="#1c1c1c"}

:::: {.columns}

::: {.column width="65%"}
- We build a trajectory of $N_{\mathrm{LF}}$ **leapfrog steps**[^v0]
  $$\begin{equation*}
  (x_{0}, v_{0})%
  \rightarrow (x_{1}, v_{1})\rightarrow \cdots%
  \rightarrow (x', v')
  \end{equation*}$$

- And propose $x'$ as the next state in our chain

$$\begin{align*}
  \textcolor{#F06292}{\Gamma}: (x, v) \textcolor{#F06292}{\rightarrow} v' &:= v - \frac{\varepsilon}{2} \partial_{x} S(x) \\
  \textcolor{#FD971F}{\Lambda}: (x, v) \textcolor{#FD971F}{\rightarrow} x' &:= x + \varepsilon v
\end{align*}$$

- We then accept / reject $x'$ using Metropolis-Hastings criteria,  
  $A(x'|x) = \min\left\{1, \frac{p(x')}{p(x)}\left|\frac{\partial x'}{\partial x}\right|\right\}$

:::

::: {.column width="30%"}

![](./assets/hmc1/hmc-update-light.svg)

:::

::::

[^v0]: We **always** start by resampling the momentum, $v_{0} \sim
\mathcal{N}(0, \mathbb{1})$

## HMC Demo {.centeredslide background-color="#1c1c1c"}

::: {#fig-hmc-demo}

<iframe data-src="https://chi-feng.github.io/mcmc-demo/app.html" width="90%" height="500" title="l2hmc-qcd"></iframe>

HMC Demo
:::

# Issues with HMC {#sec-issues style="font-size:0.9em;" background-color="#1c1c1c"}

- What do we want in a good sampler?
  - **Fast mixing** (small autocorrelations)
  - **Fast burn-in** (quick convergence)

- Problems with HMC:
  - Energy levels selected randomly $\rightarrow$ **slow mixing**
  - Cannot easily traverse low-density zones $\rightarrow$ **slow convergence**

::: {#fig-hmc-issues layout-ncol=2}
![HMC Samples with $\varepsilon=0.25$](https://raw.githubusercontent.com/saforem2/l2hmc-dwq25/main/docs/assets/hmc_traj_eps025.svg)

![HMC Samples with $\varepsilon=0.5$](https://raw.githubusercontent.com/saforem2/l2hmc-dwq25/main/docs/assets/hmc_traj_eps05.svg)

HMC Samples generated with varying step sizes $\varepsilon$
:::

# Topological Freezing {.center background-color="#1c1c1c"}

:::: {.flex-container style="text-align: center; width: 100%"}

::: {.col1 width="45%" style="text-align: left; font-size: 0.9em;"}

**Topological Charge**:
$$Q = \frac{1}{2\pi}\sum_{P}\left\lfloor x_{P}\right\rfloor  \in \mathbb{Z}$$

[**note:** $\left\lfloor x_{P} \right\rfloor = x_{P} - 2\pi
\left\lfloor\frac{x_{P} + \pi}{2\pi}\right\rfloor$]{.dim-text style="font-size:0.8em;"}


::: {.callout-important collapse=false icon=false title="⏳ Critical Slowing Down" style="text-align:left; width:100%!important;"}
- $Q$ gets stuck!
    - as $\beta\longrightarrow \infty$:
        - $Q \longrightarrow \text{const.}$
        - $\delta Q = \left(Q^{\ast} - Q\right) \rightarrow 0 \textcolor{#FF5252}{\Longrightarrow}$
    - \# configs required to estimate errors  
    **grows exponentially**:
    [$\tau_{\mathrm{int}}^{Q} \longrightarrow \infty$]{.red-text}
:::

:::

::: {.flex-container width="45%"}

![Note $\delta Q \rightarrow 0$ at increasing
$\beta$](https://raw.githubusercontent.com/saforem2/l2hmc-dwq25/main/docs/assets/critical_slowing_down.svg){width="80%"}

:::

::::

# Can we do better? {#sec-can-we-do-better background-color="#1c1c1c"}

:::: {.columns}

::: {.column width="50%"}

- Introduce two (**invertible NNs**) `vNet` and `xNet`[^l2hmc]:
  - [`vNet: ` $(x, F) \longrightarrow \left(s_{v},\, t_{v},\, q_{v}\right)$]{style="font-size:0.9em;"}  
  - [`xNet: ` $(x, v) \longrightarrow \left(s_{x},\, t_{x},\, q_{x}\right)$]{style="font-size:0.9em;"}

&nbsp;  

- Use these $(s, t, q)$ in the _generalized_ MD update:
  - [[$\Gamma_{\theta}^{\pm}$]{.pink-text} $: ({x}, \textcolor{#07B875}{v}) \xrightarrow[]{\textcolor{#F06292}{s_{v}, t_{v}, q_{v}}} (x, \textcolor{#07B875}{v'})$]{}
  - [[$\Lambda_{\theta}^{\pm}$]{.orange-text} $: (\textcolor{#AE81FF}{x}, v) \xrightarrow[]{\textcolor{#FD971F}{s_{x}, t_{x}, q_{x}}} (\textcolor{#AE81FF}{x'}, v)$]{}

:::

::: {.column width="48%"}

::: {#fig-mdupdate}

![](./assets/leapfrog-layer-2D-U1-vertical.light.svg){style="width:85%; text-align:center;"}

Generalized MD update where [$\Lambda_{\theta}^{\pm}$]{.orange-text},
[$\Gamma_{\theta}^{\pm}$]{.pink-text} are **invertible NNs**

:::

:::

::::

[^l2hmc]: [L2HMC: ](https://github.com/saforem2/l2hmc-qcd) {{< fa solid book >}}
[@Foreman:2021ixr; @Foreman:2021rhs]

# L2HMC: Generalizing the MD Update {#sec-l2hmc .smaller .centeredslide background-color="#1c1c1c"}

:::: {.flex-container style="text-align: left; width: 100%"}

::: {.col1 style="width:45%; font-size: 0.8em;"}

::: {style="border:1px solid red;"}

- Introduce $d \sim \mathcal{U}(\pm)$ to determine the direction of our update

  1. [$\textcolor{#07B875}{v'} =$ [$\Gamma^{\pm}$]{.pink-text}$({x}, \textcolor{#07B875}{v})$]{} [$\hspace{46pt}$ update $v$]{.dim-text style="font-size:0.9em;"}

  2. [$\textcolor{#AE81FF}{x'} =$ [$x_{B}$]{.blue-text}$\,+\,$[$\Lambda^{\pm}$]{.orange-text}$($[$x_{A}$]{.red-text}$, {v'})$]{} [$\hspace{10pt}$ update first **half**: $x_{A}$]{.dim-text style="font-size:0.9em;"}

  3. [$\textcolor{#AE81FF}{x''} =$ [$x'_{A}$]{.red-text}$\,+\,$[$\Lambda^{\pm}$]{.orange-text}$($[$x'_{B}$]{.blue-text}$, {v'})$]{} [$\hspace{8pt}$ update other half: $x_{B}$]{.dim-text style="font-size:0.9em;"}

  4. [$\textcolor{#07B875}{v''} =$ [$\Gamma^{\pm}$]{.pink-text}$({x''}, \textcolor{#07B875}{v'})$]{} [$\hspace{36pt}$ update $v$]{.dim-text style="font-size:0.9em;"}

:::

::: {style="border:1px solid red;"}

- Resample both $v\sim \mathcal{N}(0, 1)$, and $d \sim \mathcal{U}(\pm)$ at the
beginning of each trajectory
  - To ensure ergodicity + reversibility, we split the [$x$]{.purple-text}
  update into sequential (complementary) updates
- Introduce directional variable $d \sim \mathcal{U}(\pm)$, resampled at the
beginning of each trajectory:
  - Note that $\left(\Gamma^{+}\right)^{-1} = \Gamma^{-}$, i.e.
  $$\Gamma^{+}\left[\Gamma^{-}(x, v)\right] = \Gamma^{-}\left[\Gamma^{+}(x,
  v)\right] = (x, v)$$

:::

:::

::: {.col2 style="width:55%;"}

::: {#fig-mdupdate}

![](./assets/leapfrog-layer-2D-U1-vertical.light.svg){style="width:85%; text-align:center;"}

Generalized MD update with [$\Lambda_{\theta}^{\pm}$]{.orange-text}, [$\Gamma_{\theta}^{\pm}$]{.pink-text} **invertible NNs**
:::

:::

::::

## L2HMC: Leapfrog Layer {.center width="100%" background-color="#1c1c1c"}

:::: {.flex-container}

::: {.column style="width: 35%;"}
![](https://raw.githubusercontent.com/saforem2/l2hmc-dwq25/main/docs/assets/drawio/update_steps.svg){.absolute  top="30" width="40%"}
:::

::: {.column style="width:65%;"}
![](https://raw.githubusercontent.com/saforem2/l2hmc-dwq25/main/docs/assets/drawio/leapfrog_layer_dark2.svg){width="100%"}
:::

![](https://raw.githubusercontent.com/saforem2/l2hmc-dwq25/main/docs/assets/drawio/network_functions.svg){.absolute top=440 width="100%"}

::::

## L2HMC Update {style="font-size: 0.775em;" background-color="#1c1c1c"}

:::: {.columns}

::: {.column width="65%" style="font-size:0.7em;"}

::: {.callout-important collapse=false icon=false title="👨‍💻 Algorithm" style="text-align:left; width: 100%!important;"}

1. `input`: [$x$]{.purple-text}

    - Resample: $\textcolor{#07B875}{v} \sim \mathcal{N}(0, \mathbb{1})$; $\,\,{d\sim\mathcal{U}(\pm)}$
    - Construct initial state:
     $\textcolor{#939393}{\xi} =(\textcolor{#AE81FF}{x}, \textcolor{#07B875}{v}, {\pm})$

2. `forward`: Generate [proposal $\xi'$]{style="color:#f8f8f8"} by passing [initial $\xi$]{style="color:#939393"} through $N_{\mathrm{LF}}$ leapfrog layers  
$$\textcolor{#939393} \xi \hspace{1pt}\xrightarrow[]{\tiny{\mathrm{LF} \text{ layer}}}\xi_{1} \longrightarrow\cdots \longrightarrow \xi_{N_{\mathrm{LF}}} = \textcolor{#f8f8f8}{\xi'} := (\textcolor{#AE81FF}{x''}, \textcolor{#07B875}{v''})$$

    - Accept / Reject:
      $$\begin{equation*}
      A({\textcolor{#f8f8f8}{\xi'}}|{\textcolor{#939393}{\xi}})=
      \mathrm{min}\left\{1,
      \frac{\pi(\textcolor{#f8f8f8}{\xi'})}{\pi(\textcolor{#939393}{\xi})} \left| \mathcal{J}\left(\textcolor{#f8f8f8}{\xi'},\textcolor{#939393}{\xi}\right)\right| \right\}
      \end{equation*}$$

5. `backward` (if training):  
    - Evaluate the **loss function**[^loss] $\mathcal{L}\gets \mathcal{L}_{\theta}(\textcolor{#f8f8f8}{\xi'}, \textcolor{#939393}{\xi})$ and backprop
6. `return`: $\textcolor{#AE81FF}{x}_{i+1}$  
  Evaluate MH criteria $(1)$ and return accepted config, 
  $$\textcolor{#AE81FF}{{x}_{i+1}}\gets
  \begin{cases}
  \textcolor{#f8f8f8}{\textcolor{#AE81FF}{x''}} \small{\text{ w/ prob }} A(\textcolor{#f8f8f8}{\xi''}|\textcolor{#939393}{\xi}) \hspace{26pt} ✅ \\
  \textcolor{#939393}{\textcolor{#AE81FF}{x}} \hspace{5pt}\small{\text{ w/ prob }} 1 - A(\textcolor{#f8f8f8}{\xi''}|{\textcolor{#939393}{\xi}}) \hspace{10pt} 🚫
  \end{cases}$$

:::

:::

::: {.column width="35%"}

::: {#fig-mdupdate}

![](./assets/leapfrog-layer-2D-U1-vertical.light.svg){style="width:75%; text-align:center;"}

**Leapfrog Layer** used in generalized MD update
:::

:::

::::

[^loss]: 
    For simple $\mathbf{x} \in \mathbb{R}^{2}$ example, $\mathcal{L}_{\theta} =
    A(\xi^{\ast}|\xi)\cdot \left(\mathbf{x}^{\ast} - \mathbf{x}\right)^{2}$


# 4D $SU(3)$ Model {#sec-su3 .centeredslide background-color="#1c1c1c"}

:::: {.columns}

::: {.column width="50%"}

::: {.callout-note collapse=false icon=false title="🔗 Link Variables" style="text-align:left; width: 100%!important;"}

- Write link variables $U_{\mu}(x) \in SU(3)$:

  $$ \begin{align*} 
  U_{\mu}(x) &= \mathrm{exp}\left[{i\, \textcolor{#AE81FF}{\omega^{k}_{\mu}(x)} \lambda^{k}}\right]\\
  &= e^{i \textcolor{#AE81FF}{Q}},\quad \text{with} \quad \textcolor{#AE81FF}{Q} \in \mathfrak{su}(3)
  \end{align*}$$ 

  [where [$\omega^{k}_{\mu}(x)$]{.purple-text} $\in \mathbb{R}$, and $\lambda^{k}$ are
  the generators of $SU(3)$]{style="font-size:0.9em;"}

:::

::: {.callout-tip collapse=false icon=false title="🏃‍♂️‍➡️ Conjugate Momenta" style="text-align:left; width:100%!important;"}

- Introduce [$P_{\mu}(x) = P^{k}_{\mu}(x) \lambda^{k}$]{.green-text} conjugate to
[$\omega^{k}_{\mu}(x)$]{.purple-text}

:::

::: {.callout-important collapse=false icon=false title="🟥 Wilson Action" style="text-align:left; width:100%!important;"}

$$ S_{G} = -\frac{\beta}{6} \sum
\mathrm{Tr}\left[U_{\mu\nu}(x)
+ U^{\dagger}_{\mu\nu}(x)\right] $$

where $U_{\mu\nu}(x) = U_{\mu}(x) U_{\nu}(x+\hat{\mu})
U^{\dagger}_{\mu}(x+\hat{\nu}) U^{\dagger}_{\nu}(x)$

:::


:::

::: {.column width="45%"}

::: {#fig-4dlattice}

![](./assets/u1lattice.dark.svg){width="90%"}

Illustration of the lattice
:::

:::

::::

## HMC: 4D $SU(3)$ {#sec-hmcsu3 background-color="#1c1c1c"}

Hamiltonian: $H[P, U] = \frac{1}{2} P^{2} + S[U] \Longrightarrow$

:::: {.columns}

::: {.column style="font-size:0.9em; text-align: center;"}

::: {.callout collapse=false style="text-align:left; width: 100%!important;"}

- [$U$ update]{style="border-bottom: 2px solid #AE81FF;"}:
[$\frac{d\omega^{k}}{dt} = \frac{\partial H}{\partial P^{k}}$]{.purple-text style="font-size:1.5em;"}
$$\frac{d\omega^{k}}{dt}\lambda^{k} = P^{k}\lambda^{k} \Longrightarrow \frac{dQ}{dt} = P$$
$$\begin{align*}
Q(\textcolor{#FFEE58}{\varepsilon}) &= Q(0) + \textcolor{#FFEE58}{\varepsilon} P(0)\Longrightarrow\\
-i\, \log U(\textcolor{#FFEE58}{\varepsilon}) &= -i\, \log U(0) + \textcolor{#FFEE58}{\varepsilon} P(0) \\
U(\textcolor{#FFEE58}{\varepsilon}) &= e^{i\,\textcolor{#FFEE58}{\varepsilon} P(0)} U(0)\Longrightarrow \\
&\hspace{1pt}\\
\textcolor{#FD971F}{\Lambda}:\,\, U \longrightarrow U' &:= e^{i\varepsilon P'} U
\end{align*}$$

:::

::: aside
[$\textcolor{#FFEE58}{\varepsilon}$ is the step size]{.dim-text style="font-size:0.8em;"}
:::

:::

::: {.column style="font-size:0.9em; text-align: center;"}
::: {.callout collapse=false style="text-align:left; width: 100%!important;"}
- [$P$ update]{style="border-bottom: 2px solid #07B875;"}:
[$\frac{dP^{k}}{dt} = - \frac{\partial H}{\partial \omega^{k}}$]{.green-text style="font-size:1.5em;"} 
$$\frac{dP^{k}}{dt} = - \frac{\partial H}{\partial \omega^{k}}
= -\frac{\partial H}{\partial Q} = -\frac{dS}{dQ}\Longrightarrow$$
$$\begin{align*}
P(\textcolor{#FFEE58}{\varepsilon}) &= P(0) - \textcolor{#FFEE58}{\varepsilon} \left.\frac{dS}{dQ}\right|_{t=0} \\
&= P(0) - \textcolor{#FFEE58}{\varepsilon} \,\textcolor{#E599F7}{F[U]} \\
&\hspace{1pt}\\
\textcolor{#F06292}{\Gamma}:\,\, P \longrightarrow P' &:= P - \frac{\varepsilon}{2} F[U]
\end{align*}$$
:::

::: aside
[$\textcolor{#E599F7}{F[U]}$ is the force term]{.dim-text style="font-size:0.8em;"}
:::
:::

::::


## HMC: 4D $SU(3)$ {.centeredslide background-color="#1c1c1c"}

:::: {.columns}

::: {.column width="47%" style="text-align:left;"}

- [Momentum Update]{style="border-bottom: 2px solid #F06292;"}:
  $$\textcolor{#F06292}{\Gamma}: P \longrightarrow P' := P - \frac{\varepsilon}{2} F[U]$$

- [Link Update]{style="border-bottom: 2px solid #FD971F;"}:
  $$\textcolor{#FD971F}{\Lambda}: U \longrightarrow U' := e^{i\varepsilon P'} U\quad\quad$$

- We maintain a batch of `Nb` lattices, all updated in parallel
  - $U$`.dtype = complex128`
  - $U$`.shape`  
    [`= [Nb, 4, Nt, Nx, Ny, Nz, 3, 3]`]{style="font-size: 0.95em;"}

:::

::: {.column width="47%" style="text-align:right;"}

![](./assets/hmc/hmc-update-light.svg){width=60%}

:::

::::

# Networks 4D $SU(3)$ {#sec-su3networks .centeredslide auto-animate="true" background-color="#1c1c1c"}

:::: {.columns}

::: {.column width="54%" style="font-size:0.9em;"}

&nbsp;<br>  

&nbsp;<br>  

[$U$]{.purple-text}-Network:

  [`UNet: ` $(U, P) \longrightarrow \left(s_{U},\, t_{U},\, q_{U}\right)$]{style="font-size:0.9em;"}


&nbsp;<br>  

::: {style="border: 1px solid #1c1c1c; border-radius: 6px; padding:1%;"}

[$P$]{.green-text}-Network:

  [`PNet: ` $(U, P) \longrightarrow \left(s_{P},\, t_{P},\, q_{P}\right)$]{style="font-size:0.9em;"}  

:::

:::

::: {.column width="45%" style="text-align:right;"}

![](./assets/leapfrog-layer-4D-SU3-vertical.light.svg){width="80%"}

:::

::::

# Networks 4D $SU(3)$ {.centeredslide auto-animate="true" background-color="#1c1c1c"}

:::: {.columns}

::: {.column width="54%" style="font-size:0.9em;"}

&nbsp;<br>  

&nbsp;<br>  

[$U$]{.purple-text}-Network:

  [`UNet: ` $(U, P) \longrightarrow \left(s_{U},\, t_{U},\, q_{U}\right)$]{style="font-size:0.9em;"}

&nbsp;<br>  

::: {style="border: 1px solid #07B875; border-radius: 6px; padding:1%;"}

[$P$]{.green-text}-Network:

  [`PNet: ` $(U, P) \longrightarrow \left(s_{P},\, t_{P},\, q_{P}\right)$]{style="font-size:0.9em;"}  

:::

[$\uparrow$]{.dim-text}  
[let's look at this]{.dim-text style="padding-top: 0.5em!important;"}

:::

::: {.column width="45%" style="text-align:right;"}

![](./assets/leapfrog-layer-4D-SU3-vertical.light.svg){width="80%"}

:::

::::

## $P$-`Network` (pt. 1) {style="font-size:0.95em;" background-color="#1c1c1c"}

::: {style="text-align:center;"}

![](./assets/SU3/PNetwork.light.svg)

:::

:::: {.columns}

::: {.column width="50%"}

- [`input`[^sigma]: $\hspace{7pt}\left(U, F\right) := (e^{i Q}, F)$]{style="border-bottom: 2px solid rgba(131, 131, 131, 0.493);"}
  $$\begin{align*}
  h_{0} &= \sigma\left( w_{Q} Q + w_{F} F + b \right) \\
  h_{1} &= \sigma\left( w_{1} h_{0} + b_{1} \right) \\
  &\vdots \\
  h_{n} &= \sigma\left(w_{n-1} h_{n-2} + b_{n}\right) \\
  \textcolor{#FF5252}{z} & := \sigma\left(w_{n} h_{n-1} + b_{n}\right) \longrightarrow \\
  \end{align*}$$

:::

::: {.column width="50%"}

- [`output`[^lambda1]: $\hspace{7pt} (s_{P}, t_{P}, q_{P})$]{style="border-bottom: 2px solid rgba(131, 131, 131, 0.5);"}

  - $s_{P} = \lambda_{s} \tanh(w_s \textcolor{#FF5252}{z} + b_s)$
  - $t_{P} = w_{t} \textcolor{#FF5252}{z} + b_{t}$
  - $q_{P} = \lambda_{q} \tanh(w_{q} \textcolor{#FF5252}{z} + b_{q})$

:::

::::

[^sigma]: $\sigma(\cdot)$ denotes an activation function
[^lambda1]: $\lambda_{s}, \lambda_{q} \in \mathbb{R}$, trainable parameters

## $P$-`Network` (pt. 2) {style="font-size:0.9em;" background-color="#1c1c1c"}

::: {style="text-align:center;"}

![](./assets/SU3/PNetwork.light.svg)

:::

- Use $(s_{P}, t_{P}, q_{P})$ to update $\Gamma^{\pm}: (U, P) \rightarrow
\left(U, P_{\pm}\right)$[^inverse]:

    - [forward]{style="color:#FF5252"} $(d = \textcolor{#FF5252}{+})$:
    $$\Gamma^{\textcolor{#FF5252}{+}}(U, P) := P_{\textcolor{#FF5252}{+}} = P \cdot e^{\frac{\varepsilon}{2} s_{P}} - \frac{\varepsilon}{2}\left[ F \cdot e^{\varepsilon q_{P}} + t_{P} \right]$$

    - [backward]{style="color:#1A8FFF;"} $(d = \textcolor{#1A8FFF}{-})$: 
    $$\Gamma^{\textcolor{#1A8FFF}{-}}(U, P) := P_{\textcolor{#1A8FFF}{-}} = e^{-\frac{\varepsilon}{2} s_{P}} \left\{P + \frac{\varepsilon}{2}\left[ F \cdot e^{\varepsilon q_{P}} + t_{P} \right]\right\}$$


[^inverse]: Note that $\left(\Gamma^{+}\right)^{-1} = \Gamma^{-}$, i.e. $\Gamma^{+}\left[\Gamma^{-}(U, P)\right] = \Gamma^{-}\left[\Gamma^{+}(U, P)\right] = (U, P)$

# Results: 2D $U(1)$ {#sec-results .centeredslide background-color="#1c1c1c"}

:::: {.columns}

::: {.column width=50% style="align:top;"}
![](https://raw.githubusercontent.com/saforem2/physicsSeminar/main/assets/autocorr_new.svg){width="90%"}
:::

::: {.column width="33%" style="text-align:left; padding-top: 5%;"}

::: {.callout-important icon=false collapse=false title="📈 Improvement" style="text-align:left!important; width: 100%!important;"}
We can measure the performance by comparing $\tau_{\mathrm{int}}$ for the
[**trained model**]{style="color:#FF2052;"} vs.
[**HMC**]{style="color:#9F9F9F;"}.  
  
**Note**: [lower]{style="color:#FF2052;"} is better
:::

:::

::::

![](https://raw.githubusercontent.com/saforem2/physicsSeminar/main/assets/charge_histories.svg){.absolute top=400 left=0 width="100%" style="margin-bottom: 1em;margin-top: 1em;"}


## Interpretation {#sec-interpretation .centeredslide background-color="#1c1c1c"}

:::: {.columns style="margin-left:1pt;"}

::: {.column width="36%"}

[Deviation in $x_{P}$]{.dim-text style="text-align:center; font-size:0.8em"}

:::

::: {.column width="30%"}

[Topological charge mixing]{.dim-text style="text-align:right; font-size:0.8em"}

:::

::: {.column width="32%"}

[Artificial influx of energy]{.dim-text style="text-align:right!important; font-size:0.8em;"}

:::

::::

::: {#fig-interpretation}

![](https://raw.githubusercontent.com/saforem2/physicsSeminar/main/assets/ridgeplots.svg){width="100%"}

Illustration of how different observables evolve over a single L2HMC
trajectory.
:::


## Interpretation {.centeredslide background-color="#1c1c1c"}

::: {#fig-energy-ridgeplot layout-ncol=2 layout-valign="top"}

![Average plaquette: $\langle x_{P}\rangle$ vs LF step](https://raw.githubusercontent.com/saforem2/physicsSeminar/main/assets/plaqsf_ridgeplot.svg)

![Average energy: $H - \sum\log|\mathcal{J}|$](https://raw.githubusercontent.com/saforem2/physicsSeminar/main/assets/Hf_ridgeplot.svg)

The trained model artifically increases the energy towards
the middle of the trajectory, allowing the sampler to tunnel between isolated
sectors.
:::


# 4D $SU(3)$ Results {#sec-su3results background-color="#1c1c1c"}


- Distribution of $\log|\mathcal{J}|$ over all chains, at each _leapfrog step_,  $N_{\mathrm{LF}}$
($= 0, 1, \ldots, 8$)
during training:

::: {layout="[ [30, 30, 30] ]" layout-valign="center" style="display: flex; flex-direction: row; margin-top: -0.0em; align-items: center;"}

![`100` train iters](./assets/SU3/logdet_ridgeplot1.svg){#fig-ridgeplot1}

![`500` train iters](./assets/SU3/logdet_ridgeplot2.svg){#fig-ridgeplot2}

![`1000` train iters](./assets/SU3/logdet_ridgeplot3.svg){#fig-ridgeplot3}

:::

## 4D $SU(3)$ Results: $\delta U_{\mu\nu}$ {background-color="#1c1c1c"}

::: {#fig-pdiff}

![](./assets/SU3/pdiff.svg)

The difference in the average plaquette $\left|\delta U_{\mu\nu}\right|^{2}$
between the trained model and HMC

:::

## 4D $SU(3)$ Results: $\delta U_{\mu\nu}$ {background-color="#1c1c1c"}

::: {#fig-pdiff-robust}

![](./assets/SU3/pdiff-robust.svg)

The difference in the average plaquette $\left|\delta U_{\mu\nu}\right|^{2}$
between the trained model and HMC

:::

# Next Steps {#sec-next-steps background-color="#1c1c1c"}

- Further code development
  - {{< fa brands github >}} [`saforem2/l2hmc-qcd`](https://github.com/saforem2/l2hmc-qcd)

- Continue to use / test different network architectures
  - Gauge equivariant NNs for $U_{\mu}(x)$ update

- Continue to test different loss functions for training

- Scaling:
  - Lattice volume
  - Network size
  - Batch size
  - \# of GPUs

## Thank you! {#sec-thank-you background-color="#1c1c1c"}

&nbsp;<br>  

::: {layout-ncol=4 style="text-align:left; font-size:0.8em;"}

[[{{< fa solid home >}} `samforeman.me`](https://samforeman.me)]{style="font-size:0.8em;"}

[[{{< fa brands github >}} `saforem2`](https://github.com/saforem2)]{style="font-size:0.8em;"}

[[{{< fa brands twitter >}} `@saforem2`](https://www.twitter.com/saforem2)]{style="font-size:0.8em;"}

[[{{< fa regular paper-plane >}} `[email protected]`](mailto:///[email protected])]{style="font-size:0.8em;"}

:::

::: {.callout-note icon=false collapse=false title="🙏 Acknowledgements" style="width: 100%!important;"}

This research used resources of the Argonne Leadership Computing Facility,  
which is a DOE Office of Science User Facility supported under Contract DE-AC02-06CH11357.

:::


## {#sec-l2hmc-gh background-color="#1c1c1c"}

::: {style="text-align:center;"}

[![](https://raw.githubusercontent.com/saforem2/l2hmc-qcd/main/assets/logo-small.svg)](https://github.com/saforem2/l2hmc-qcd)

<a href="https://hits.seeyoufarm.com"><img alt="hits" src="https://hits.seeyoufarm.com/api/count/incr/badge.svg?url=https%3A%2F%2Fgithub.com%2Fsaforem2%2Fl2hmc-qcd&count_bg=%2300CCFF&title_bg=%23555555&icon=&icon_color=%23111111&title=👋&edge_flat=false"></a>
<a href="https://github.com/saforem2/l2hmc-qcd/"><img alt="l2hmc-qcd" src="https://img.shields.io/badge/-l2hmc--qcd-252525?style=flat&logo=github&labelColor=gray"></a> <a href="https://www.codefactor.io/repository/github/saforem2/l2hmc-qcd"><img alt="codefactor" src="https://www.codefactor.io/repository/github/saforem2/l2hmc-qcd/badge"></a>  

<a href="https://arxiv.org/abs/2112.01582"><img alt="arxiv" src="http://img.shields.io/badge/arXiv-2112.01582-B31B1B.svg"></a> <a href="https://arxiv.org/abs/2105.03418"><img alt="arxiv" src="http://img.shields.io/badge/arXiv-2105.03418-B31B1B.svg"></a>  

<a href="https://hydra.cc"><img alt="hydra" src="https://img.shields.io/badge/Config-Hydra-89b8cd"></a> <a href="https://pytorch.org/get-started/locally/"><img alt="pyTorch" src="https://img.shields.io/badge/PyTorch-ee4c2c?logo=pytorch&logoColor=white"></a> <a href="https://www.tensorflow.org"><img alt="tensorflow" src="https://img.shields.io/badge/TensorFlow-%23FF6F00.svg?&logo=TensorFlow&logoColor=white"></a> 
[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Weights & Biases monitoring" height=20>](https://wandb.ai/l2hmc-qcd/l2hmc-qcd)

:::


## Acknowledgements {#sec-acknowledgements background-color="#1c1c1c"}

:::: {.columns}


::: {.column width="50%"}

- **Links**:
   - [{{< fa brands github >}} Link to github](https://github.com/saforem2/l2hmc-qcd)
   - [{{< fa solid paper-plane >}} reach out!](mailto:[email protected])

- **References**:
    - [Link to slides](https://saforem2.github.io/lattice23/)
        - [{{< fa brands github >}} link to github with slides](https://github.com/saforem2/lattice23)
   - {{< fa solid book >}} [@Foreman:2021ljl; @Foreman:2021rhs; @Foreman:2021ixr]
   - {{< fa solid book >}} [@Boyda:2022nmh; @Shanahan:2022ifi]

:::

::: {.column width="50%"}

- Huge thank you to:
  - Yannick Meurice
  - Norman Christ
  - Akio Tomiya
  - Nobuyuki Matsumoto
  - Richard Brower
  - Luchang Jin
  - Chulwoo Jung
  - Peter Boyle
  - Taku Izubuchi
  - Denis Boyda
  - Dan Hackett
  - ECP-CSD group
  - [**ALCF Staff + Datascience Group**]{.red-text}

:::

::::


## {#sec-references background-color="#1c1c1c"}

:::: {.columns}

::: {.column width="50%"}

### Links {background-color="#1c1c1c"}

- [{{< fa brands github >}} `saforem2/l2hmc-qcd`](https://github.com/saforem2/l2hmc-qcd)

- [📊 slides](https://saforem2.github.io/lattice23) (Github: [{{< fa brands github >}} `saforem2/lattice23`](https://github.com/saforem2/lattice23))

:::

::: {.column width="50%"}

### References {background-color="#1c1c1c"}

- [Title Slide Background (worms) animation](https://saforem2.github.io/grid-worms-animation/)
  - Github: [{{< fa brands github >}} `saforem2/grid-worms-animation`](https://github.com/saforem2/grid-worms-animation)

- [Link to HMC demo](https://chi-feng.github.io/mcmc-demo/app.html)

:::

::::


## References {style="line-height:1.2em;" background-color="#1c1c1c"}

(I don't know why this is broken 🤷🏻‍♂️ )

::: {#refs}
:::

# Extras {#sec-extras background-color="#1c1c1c"}

## Integrated Autocorrelation Time {.centeredslide background-color="#1c1c1c"}

::: {#fig-iat}
![](https://raw.githubusercontent.com/saforem2/physicsSeminar/main/assets/tint1.svg){width="100%"}

Plot of the integrated autocorrelation time for both the trained model
(colored) and HMC (greyscale).
:::

## Comparison {background-color="#1c1c1c"}

::: {#fig-comparison layout-ncol=2}

![Trained model](https://saforem2.github.io/anl-job-talk/assets/dQint_eval.svg){#fig-eval}

![Generic HMC](https://saforem2.github.io/anl-job-talk/assets/dQint_hmc.svg){#fig-hmc}

Comparison of $\langle \delta Q\rangle = \frac{1}{N}\sum_{i=k}^{N} \delta Q_{i}$ for the
trained model [@fig-eval] vs. HMC [@fig-hmc]
:::

## Plaquette analysis: $x_{P}$ {.centeredslide background-color="#1c1c1c"}

:::: {.columns}

::: {.column width="55%"}

[Deviation from $V\rightarrow\infty$ limit,  $x_{P}^{\ast}$]{.dim-text style="text-align:center; font-size:0.9em;"}
:::

::: {.column width="45%"}

[Average $\langle x_{P}\rangle$, with $x_{P}^{\ast}$ (dotted-lines)]{.dim-text style="text-align:right!important; font-size:0.9em;"}
:::

::::

::: {#fig-avg-plaq}

![](https://raw.githubusercontent.com/saforem2/physicsSeminar/main/assets/plaqsf_vs_lf_step1.svg){width="100%"}

Plot showing how **average plaquette**, $\left\langle x_{P}\right\rangle$
varies over a single trajectory for models trained at different $\beta$, with
varying trajectory lengths $N_{\mathrm{LF}}$
:::

## Loss Function {background-color="#1c1c1c"}

- Want to maximize the _expected_ squared charge difference[^charge-diff]:
  $$\begin{equation*}
  \mathcal{L}_{\theta}\left(\xi^{\ast}, \xi\right) =
  {\mathbb{E}_{p(\xi)}}\big[-\textcolor{#FA5252}{{\delta Q}}^{2}
  \left(\xi^{\ast}, \xi \right)\cdot A(\xi^{\ast}|\xi)\big]
  \end{equation*}$$

- Where:
    - $\delta Q$ is the _tunneling rate_:
      $$\begin{equation*}
      \textcolor{#FA5252}{\delta Q}(\xi^{\ast},\xi)=\left|Q^{\ast} - Q\right|
      \end{equation*}$$

    - $A(\xi^{\ast}|\xi)$ is the probability[^jacobian] of accepting the proposal $\xi^{\ast}$:
      $$\begin{equation*}
      A(\xi^{\ast}|\xi) = \mathrm{min}\left( 1,
      \frac{p(\xi^{\ast})}{p(\xi)}\left|\frac{\partial \xi^{\ast}}{\partial
      \xi^{T}}\right|\right)
      \end{equation*}$$

[^charge-diff]: Where $\xi^{\ast}$ is the _proposed_ configuration (prior to
Accept / Reject)
[^jacobian]: And $\left|\frac{\partial \xi^{\ast}}{\partial \xi^{T}}\right|$ is the
Jacobian of the transformation from $\xi \rightarrow \xi^{\ast}$

## Networks 2D $U(1)$ {auto-animate=true background-color="#1c1c1c"}

- Stack gauge links as `shape`$\left(U_{\mu}\right)$` =[Nb, 2, Nt, Nx]` $\in \mathbb{C}$

  $$ x_{\mu}(n) ≔ \left[\cos(x), \sin(x)\right]$$

  with `shape`$\left(x_{\mu}\right)$` = [Nb, 2, Nt, Nx, 2]` $\in \mathbb{R}$

- $x$-Network:
    - [$\psi_{\theta}: (x, v) \longrightarrow \left(s_{x},\, t_{x},\, q_{x}\right)$]{.purple-text}

- $v$-Network:
  - [$\varphi_{\theta}: (x, v) \longrightarrow \left(s_{v},\, t_{v},\, q_{v}\right)$]{.green-text} [$\hspace{2pt}\longleftarrow$ lets look at this]{.dim-text}

## $v$-Update[^reverse] {background-color="#1c1c1c"}

- [forward]{style="color:#FF5252"} $(d = \textcolor{#FF5252}{+})$:  

$$\Gamma^{\textcolor{#FF5252}{+}}: (x, v) \rightarrow v' := v \cdot e^{\frac{\varepsilon}{2} s_{v}} - \frac{\varepsilon}{2}\left[ F \cdot e^{\varepsilon q_{v}} + t_{v} \right]$$

- [backward]{style="color:#1A8FFF;"} $(d = \textcolor{#1A8FFF}{-})$:

$$\Gamma^{\textcolor{#1A8FFF}{-}}: (x, v) \rightarrow v' := e^{-\frac{\varepsilon}{2} s_{v}} \left\{v + \frac{\varepsilon}{2}\left[ F \cdot e^{\varepsilon q_{v}} + t_{v} \right]\right\}$$

[^reverse]: [Note that $\left(\Gamma^{+}\right)^{-1} = \Gamma^{-}$, i.e. $\Gamma^{+}\left[\Gamma^{-}(x, v)\right] = \Gamma^{-}\left[\Gamma^{+}(x, v)\right] = (x, v)$]{style="font-size:0.8em;"}

## $x$-Update {background-color="#1c1c1c"}

- [forward]{style="color:#FF5252"} $(d = \textcolor{#FF5252}{+})$:

$$\Lambda^{\textcolor{#FF5252}{+}}(x, v) = x \cdot e^{\frac{\varepsilon}{2} s_{x}} - \frac{\varepsilon}{2}\left[ v \cdot e^{\varepsilon q_{x}} + t_{x} \right]$$

- [backward]{style="color:#1A8FFF;"} $(d = \textcolor{#1A8FFF}{-})$:

$$\Lambda^{\textcolor{#1A8FFF}{-}}(x, v) = e^{-\frac{\varepsilon}{2} s_{x}} \left\{x + \frac{\varepsilon}{2}\left[ v \cdot e^{\varepsilon q_{x}} + t_{x} \right]\right\}$$


## Lattice Gauge Theory (2D $U(1)$) {.centeredslide background-color="#1c1c1c"}

:::: {.columns layout-valign="top"}

::: {.column width="50%"}

::: {style="text-align:center;"}

::: {.callout-note icon=false collapse=false title="🔗 Link Variables" style="width:100%!important; text-align:left;"}
$$U_{\mu}(n) = e^{i x_{\mu}(n)}\in \mathbb{C},\quad \text{where}\quad$$
$$x_{\mu}(n) \in [-\pi,\pi)$$
:::

::: {}

::: {.callout-important icon=false collapse=false title="🫸 Wilson Action" style="width:100%!important; text-align:left;"}
$$S_{\beta}(x) = \beta\sum_{P} \cos \textcolor{#00CCFF}{x_{P}},$$

$$\textcolor{#00CCFF}{x_{P}} = \left[x_{\mu}(n) + x_{\nu}(n+\hat{\mu})
- x_{\mu}(n+\hat{\nu})-x_{\nu}(n)\right]$$
:::

[**Note**: $\textcolor{#00CCFF}{x_{P}}$ is the product of
links around $1\times 1$ square, called a ["plaquette"]{.blue-text}]{.dim-text style=font-size:0.8em;}
:::

:::

:::

::: {.column width="50%"}

![2D Lattice](https://raw.githubusercontent.com/saforem2/deep-fridays/main/assets/u1lattice.dark.svg){width="80%"}

:::

::::


## {background-color="white"}

::: {#fig-notebook}

<iframe data-src="https://nbviewer.org/github/saforem2/l2hmc-qcd/blob/SU3/src/l2hmc/notebooks/l2hmc-2dU1.ipynb" width="100%" height="650" title="l2hmc-qcd"></iframe>

Jupyter Notebook

:::

## Annealing Schedule {#sec-annealing-schedule background-color="#1c1c1c"}
 
- Introduce an _annealing schedule_ during the training phase:

  $$\left\{ \gamma_{t}  \right\}_{t=0}^{N} = \left\{\gamma_{0}, \gamma_{1},
  \ldots, \gamma_{N-1}, \gamma_{N} \right\}$$

  where $\gamma_{0} < \gamma_{1} < \cdots < \gamma_{N} \equiv 1$, and $\left|\gamma_{t+1} - \gamma_{t}\right| \ll 1$  

- [**Note**]{.green-text}: 
    - for $\left|\gamma_{t}\right| < 1$, this rescaling helps to reduce the
      height of the energy barriers $\Longrightarrow$
    - easier for our sampler to explore previously inaccessible regions of the phase space


## Networks 2D $U(1)$ {#sec-networks-2dU1 background-color="#1c1c1c"}

- Stack gauge links as `shape`$\left(U_{\mu}\right)$` =[Nb, 2, Nt, Nx]` $\in \mathbb{C}$

  $$ x_{\mu}(n) ≔ \left[\cos(x), \sin(x)\right]$$

  with `shape`$\left(x_{\mu}\right)$` = [Nb, 2, Nt, Nx, 2]` $\in \mathbb{R}$

- $x$-Network:
    - [$\psi_{\theta}: (x, v) \longrightarrow \left(s_{x},\, t_{x},\, q_{x}\right)$]{.purple-text}

- $v$-Network:
    - [$\varphi_{\theta}: (x, v) \longrightarrow \left(s_{v},\, t_{v},\, q_{v}\right)$]{.green-text}

## Toy Example: GMM $\in \mathbb{R}^{2}$ {.centeredslide background-color="#1c1c1c"}

![](https://raw.githubusercontent.com/saforem2/l2hmc-dwq25/main/docs/assets/iso_gmm_chains.svg){#fig-gmm .r-stretch}

## Physical Quantities {background-color="#1c1c1c"}

- To estimate physical quantities, we:
  - Calculate physical observables at **increasing** spatial resolution
  - Perform extrapolation to continuum limit

::: {#fig-continuum}

![](https://raw.githubusercontent.com/saforem2/physicsSeminar/main/assets/static/continuum.svg)

Increasing the physical resolution ($a \rightarrow 0$) allows us to make
predictions about numerical values of physical quantities in the continuum
limit.

:::

## Extra {#sec-extra background-color="#1c1c1c"}

[![](./assets/thumbnail.png)]{.preview-image style="text-align:center; margin-left:auto; margin-right: auto;"}
  • samforeman.me

 
  • View source
  • Edit this page
  • Report an issue