MLMC: Machine Learning Monte Carlo

Author
Affiliation
Published

July 31, 2023

Modified

June 17, 2025

MLMC: Machine Learning Monte Carlo
for Lattice Gauge Theory

 

Sam Foreman
Xiao-Yong Jin, James C. Osborn
saforem2/{lattice23, l2hmc-qcd}

Overview

  1. Background: {MCMC,HMC}
  2. L2HMC: Generalizing MD
  3. References
  4. Extras

Markov Chain Monte Carlo (MCMC)

Note🎯 Goal

Generate independent samples {xi}\{x_{i}\}, such that {xi}p(x)eS(x)\{x_{i}\} \sim p(x) \propto e^{-S(x)} where S(x)S(x) is the action (or potential energy)

  • Want to calculate observables O\mathcal{O}:
    O[Dx]O(x)p(x)\left\langle \mathcal{O}\right\rangle \propto \int \left[\mathcal{D}x\right]\hspace{4pt} {\mathcal{O}(x)\, p(x)}

If these were independent, we could approximate: O1Nn=1NO(xn)\left\langle\mathcal{O}\right\rangle \simeq \frac{1}{N}\sum^{N}_{n=1}\mathcal{O}(x_{n})
σO2=1NVar[O(x)]σO1N\sigma_{\mathcal{O}}^{2} = \frac{1}{N}\mathrm{Var}{\left[\mathcal{O} (x) \right]}\Longrightarrow \sigma_{\mathcal{O}} \propto \frac{1}{\sqrt{N}}

Markov Chain Monte Carlo (MCMC)

Note🎯 Goal

Generate independent samples {xi}\{x_{i}\}, such that {xi}p(x)eS(x)\{x_{i}\} \sim p(x) \propto e^{-S(x)} where S(x)S(x) is the action (or potential energy)

  • Want to calculate observables O\mathcal{O}:
    O[Dx]O(x)p(x)\left\langle \mathcal{O}\right\rangle \propto \int \left[\mathcal{D}x\right]\hspace{4pt} {\mathcal{O}(x)\, p(x)}

Instead, nearby configs are correlated, and we incur a factor of τintO\textcolor{#FF5252}{\tau^{\mathcal{O}}_{\mathrm{int}}}: σO2=τintONVar[O(x)]\sigma_{\mathcal{O}}^{2} = \frac{\textcolor{#FF5252}{\tau^{\mathcal{O}}_{\mathrm{int}}}}{N}\mathrm{Var}{\left[\mathcal{O} (x) \right]}

Hamiltonian Monte Carlo (HMC)

  • Want to (sequentially) construct a chain of states: x0x1xixNx_{0} \rightarrow x_{1} \rightarrow x_{i} \rightarrow \cdots \rightarrow x_{N}\hspace{10pt}

    such that, as NN \rightarrow \infty: {xi,xi+1,xi+2,,xN}Np(x)eS(x)\left\{x_{i}, x_{i+1}, x_{i+2}, \cdots, x_{N} \right\} \xrightarrow[]{N\rightarrow\infty} p(x) \propto e^{-S(x)}

Tip🪄 Trick
  • Introduce fictitious momentum vN(0,1)v \sim \mathcal{N}(0, \mathbb{1})
    • Normally distributed independent of xx, i.e. p(x,v)=p(x)p(v)eS(x)e12vTv=e[S(x)+12vTv]=eH(x,v)\begin{align*} p(x, v) &\textcolor{#02b875}{=} p(x)\,p(v) \propto e^{-S{(x)}} e^{-\frac{1}{2} v^{T}v} = e^{-\left[S(x) + \frac{1}{2} v^{T}{v}\right]} \textcolor{#02b875}{=} e^{-H(x, v)} \end{align*}

Hamiltonian Monte Carlo (HMC)

  • Idea: Evolve the (x˙,v˙)(\dot{x}, \dot{v}) system to get new states {xi}\{x_{i}\}

  • Write the joint distribution p(x,v)p(x, v): p(x,v)eS[x]e12vTv=eH(x,v) p(x, v) \propto e^{-S[x]} e^{-\frac{1}{2}v^{T} v} = e^{-H(x, v)}

Tip🔋 Hamiltonian Dynamics

H=S[x]+12vTvH = S[x] + \frac{1}{2} v^{T} v \Longrightarrow x˙=+vH,  v˙=xH\dot{x} = +\partial_{v} H, \,\,\dot{v} = -\partial_{x} H

Figure 1: Overview of HMC algorithm

Leapfrog Integrator (HMC)

Tip🔋 Hamiltonian Dynamics

(x˙,v˙)=(vH,xH)\left(\dot{x}, \dot{v}\right) = \left(\partial_{v} H, -\partial_{x} H\right)

Note🐸 Leapfrog Step

input (x,v)(x,v)\,\left(x, v\right) \rightarrow \left(x', v'\right)\, output

v~:=Γ(x,v)=vε2xS(x)x:=Λ(x,v~)=x+εv~v:=Γ(x,v~)=v~ε2xS(x)\begin{align*} \tilde{v} &:= \textcolor{#F06292}{\Gamma}(x, v)\hspace{2.2pt} = v - \frac{\varepsilon}{2} \partial_{x} S(x) \\ x' &:= \textcolor{#FD971F}{\Lambda}(x, \tilde{v}) \, = x + \varepsilon \, \tilde{v} \\ v' &:= \textcolor{#F06292}{\Gamma}(x', \tilde{v}) = \tilde{v} - \frac{\varepsilon}{2} \partial_{x} S(x') \end{align*}

Warning⚠️ Warning!
  • Resample v0N(0,1)v_{0} \sim \mathcal{N}(0, \mathbb{1})
    at the beginning of each trajectory

Note: xS(x)\partial_{x} S(x) is the force

HMC Update

  • We build a trajectory of NLFN_{\mathrm{LF}} leapfrog steps (x0,v0)(x1,v1)(x,v)\begin{equation*} (x_{0}, v_{0})% \rightarrow (x_{1}, v_{1})\rightarrow \cdots% \rightarrow (x', v') \end{equation*}

  • And propose xx' as the next state in our chain

Γ:(x,v)v:=vε2xS(x)Λ:(x,v)x:=x+εv\begin{align*} \textcolor{#F06292}{\Gamma}: (x, v) \textcolor{#F06292}{\rightarrow} v' &:= v - \frac{\varepsilon}{2} \partial_{x} S(x) \\ \textcolor{#FD971F}{\Lambda}: (x, v) \textcolor{#FD971F}{\rightarrow} x' &:= x + \varepsilon v \end{align*}

  • We then accept / reject xx' using Metropolis-Hastings criteria,
    A(xx)=min{1,p(x)p(x)xx}A(x'|x) = \min\left\{1, \frac{p(x')}{p(x)}\left|\frac{\partial x'}{\partial x}\right|\right\}

HMC Demo

Figure 2: HMC Demo

Issues with HMC

  • What do we want in a good sampler?
    • Fast mixing (small autocorrelations)
    • Fast burn-in (quick convergence)
  • Problems with HMC:
    • Energy levels selected randomly \rightarrow slow mixing
    • Cannot easily traverse low-density zones \rightarrow slow convergence

HMC Samples with ε=0.25\varepsilon=0.25

HMC Samples with ε=0.5\varepsilon=0.5
Figure 3: HMC Samples generated with varying step sizes ε\varepsilon

Topological Freezing

Topological Charge: Q=12πPxPZQ = \frac{1}{2\pi}\sum_{P}\left\lfloor x_{P}\right\rfloor \in \mathbb{Z}

note: xP=xP2πxP+π2π\left\lfloor x_{P} \right\rfloor = x_{P} - 2\pi \left\lfloor\frac{x_{P} + \pi}{2\pi}\right\rfloor

Important⏳ Critical Slowing Down
  • QQ gets stuck!
    • as β\beta\longrightarrow \infty:
      • Qconst.Q \longrightarrow \text{const.}
      • δQ=(QQ)0\delta Q = \left(Q^{\ast} - Q\right) \rightarrow 0 \textcolor{#FF5252}{\Longrightarrow}
    • # configs required to estimate errors
      grows exponentially: τintQ\tau_{\mathrm{int}}^{Q} \longrightarrow \infty

Note \delta Q \rightarrow 0 at increasing \beta

Note δQ0\delta Q \rightarrow 0 at increasing β\beta

Can we do better?

  • Introduce two (invertible NNs) vNet and xNet:
    • vNet: (x,F)(sv,tv,qv)(x, F) \longrightarrow \left(s_{v},\, t_{v},\, q_{v}\right)
    • xNet: (x,v)(sx,tx,qx)(x, v) \longrightarrow \left(s_{x},\, t_{x},\, q_{x}\right)

 

  • Use these (s,t,q)(s, t, q) in the generalized MD update:
    • Γθ±\Gamma_{\theta}^{\pm} :(x,v)sv,tv,qv(x,v): ({x}, \textcolor{#07B875}{v}) \xrightarrow[]{\textcolor{#F06292}{s_{v}, t_{v}, q_{v}}} (x, \textcolor{#07B875}{v'})
    • Λθ±\Lambda_{\theta}^{\pm} :(x,v)sx,tx,qx(x,v): (\textcolor{#AE81FF}{x}, v) \xrightarrow[]{\textcolor{#FD971F}{s_{x}, t_{x}, q_{x}}} (\textcolor{#AE81FF}{x'}, v)
Figure 4: Generalized MD update where Λθ±\Lambda_{\theta}^{\pm}, Γθ±\Gamma_{\theta}^{\pm} are invertible NNs

L2HMC: Generalizing the MD Update

  • Introduce dU(±)d \sim \mathcal{U}(\pm) to determine the direction of our update

    1. v=\textcolor{#07B875}{v'} = Γ±\Gamma^{\pm}(x,v)({x}, \textcolor{#07B875}{v}) \hspace{46pt} update vv

    2. x=\textcolor{#AE81FF}{x'} = xBx_{B}+\,+\,Λ±\Lambda^{\pm}((xAx_{A},v), {v'}) \hspace{10pt} update first half: xAx_{A}

    3. x=\textcolor{#AE81FF}{x''} = xAx'_{A}+\,+\,Λ±\Lambda^{\pm}((xBx'_{B},v), {v'}) \hspace{8pt} update other half: xBx_{B}

    4. v=\textcolor{#07B875}{v''} = Γ±\Gamma^{\pm}(x,v)({x''}, \textcolor{#07B875}{v'}) \hspace{36pt} update vv

  • Resample both vN(0,1)v\sim \mathcal{N}(0, 1), and dU(±)d \sim \mathcal{U}(\pm) at the beginning of each trajectory
    • To ensure ergodicity + reversibility, we split the xx update into sequential (complementary) updates
  • Introduce directional variable dU(±)d \sim \mathcal{U}(\pm), resampled at the beginning of each trajectory:
    • Note that (Γ+)1=Γ\left(\Gamma^{+}\right)^{-1} = \Gamma^{-}, i.e. Γ+[Γ(x,v)]=Γ[Γ+(x,v)]=(x,v)\Gamma^{+}\left[\Gamma^{-}(x, v)\right] = \Gamma^{-}\left[\Gamma^{+}(x, v)\right] = (x, v)
Figure 5: Generalized MD update with Λθ±\Lambda_{\theta}^{\pm}, Γθ±\Gamma_{\theta}^{\pm} invertible NNs

L2HMC: Leapfrog Layer

L2HMC Update

Important👨‍💻 Algorithm
  1. input: xx

    • Resample: vN(0,1)\textcolor{#07B875}{v} \sim \mathcal{N}(0, \mathbb{1});   dU(±)\,\,{d\sim\mathcal{U}(\pm)}
    • Construct initial state: ξ=(x,v,±)\textcolor{#939393}{\xi} =(\textcolor{#AE81FF}{x}, \textcolor{#07B875}{v}, {\pm})
  2. forward: Generate proposal ξ\xi' by passing initial ξ\xi through NLFN_{\mathrm{LF}} leapfrog layers
    ξLF layerξ1ξNLF=ξ:=(x,v)\textcolor{#939393} \xi \hspace{1pt}\xrightarrow[]{\tiny{\mathrm{LF} \text{ layer}}}\xi_{1} \longrightarrow\cdots \longrightarrow \xi_{N_{\mathrm{LF}}} = \textcolor{#f8f8f8}{\xi'} := (\textcolor{#AE81FF}{x''}, \textcolor{#07B875}{v''})

    • Accept / Reject: A(ξξ)=min{1,π(ξ)π(ξ)J(ξ,ξ)}\begin{equation*} A({\textcolor{#f8f8f8}{\xi'}}|{\textcolor{#939393}{\xi}})= \mathrm{min}\left\{1, \frac{\pi(\textcolor{#f8f8f8}{\xi'})}{\pi(\textcolor{#939393}{\xi})} \left| \mathcal{J}\left(\textcolor{#f8f8f8}{\xi'},\textcolor{#939393}{\xi}\right)\right| \right\} \end{equation*}
  3. backward (if training):

    • Evaluate the loss function LLθ(ξ,ξ)\mathcal{L}\gets \mathcal{L}_{\theta}(\textcolor{#f8f8f8}{\xi'}, \textcolor{#939393}{\xi}) and backprop
  4. return: xi+1\textcolor{#AE81FF}{x}_{i+1}
    Evaluate MH criteria (1)(1) and return accepted config, xi+1{x w/ prob A(ξξ)x w/ prob 1A(ξξ)🚫\textcolor{#AE81FF}{{x}_{i+1}}\gets \begin{cases} \textcolor{#f8f8f8}{\textcolor{#AE81FF}{x''}} \small{\text{ w/ prob }} A(\textcolor{#f8f8f8}{\xi''}|\textcolor{#939393}{\xi}) \hspace{26pt} ✅ \\ \textcolor{#939393}{\textcolor{#AE81FF}{x}} \hspace{5pt}\small{\text{ w/ prob }} 1 - A(\textcolor{#f8f8f8}{\xi''}|{\textcolor{#939393}{\xi}}) \hspace{10pt} 🚫 \end{cases}

Figure 6: Leapfrog Layer used in generalized MD update

4D SU(3)SU(3) Model

Note🔗 Link Variables
  • Write link variables Uμ(x)SU(3)U_{\mu}(x) \in SU(3):

    Uμ(x)=exp[iωμk(x)λk]=eiQ,withQsu(3) \begin{align*} U_{\mu}(x) &= \mathrm{exp}\left[{i\, \textcolor{#AE81FF}{\omega^{k}_{\mu}(x)} \lambda^{k}}\right]\\ &= e^{i \textcolor{#AE81FF}{Q}},\quad \text{with} \quad \textcolor{#AE81FF}{Q} \in \mathfrak{su}(3) \end{align*}

    where ωμk(x)\omega^{k}_{\mu}(x) R\in \mathbb{R}, and λk\lambda^{k} are the generators of SU(3)SU(3)

Tip🏃‍♂️‍➡️ Conjugate Momenta
  • Introduce Pμ(x)=Pμk(x)λkP_{\mu}(x) = P^{k}_{\mu}(x) \lambda^{k} conjugate to ωμk(x)\omega^{k}_{\mu}(x)
Important🟥 Wilson Action

SG=β6Tr[Uμν(x)+Uμν(x)] S_{G} = -\frac{\beta}{6} \sum \mathrm{Tr}\left[U_{\mu\nu}(x) + U^{\dagger}_{\mu\nu}(x)\right]

where Uμν(x)=Uμ(x)Uν(x+μ^)Uμ(x+ν^)Uν(x)U_{\mu\nu}(x) = U_{\mu}(x) U_{\nu}(x+\hat{\mu}) U^{\dagger}_{\mu}(x+\hat{\nu}) U^{\dagger}_{\nu}(x)

Figure 7: Illustration of the lattice

HMC: 4D SU(3)SU(3)

Hamiltonian: H[P,U]=12P2+S[U]H[P, U] = \frac{1}{2} P^{2} + S[U] \Longrightarrow

  • UU update: dωkdt=HPk\frac{d\omega^{k}}{dt} = \frac{\partial H}{\partial P^{k}} dωkdtλk=PkλkdQdt=P\frac{d\omega^{k}}{dt}\lambda^{k} = P^{k}\lambda^{k} \Longrightarrow \frac{dQ}{dt} = P Q(ε)=Q(0)+εP(0)ilogU(ε)=ilogU(0)+εP(0)U(ε)=eiεP(0)U(0)Λ:  UU:=eiεPU\begin{align*} Q(\textcolor{#FFEE58}{\varepsilon}) &= Q(0) + \textcolor{#FFEE58}{\varepsilon} P(0)\Longrightarrow\\ -i\, \log U(\textcolor{#FFEE58}{\varepsilon}) &= -i\, \log U(0) + \textcolor{#FFEE58}{\varepsilon} P(0) \\ U(\textcolor{#FFEE58}{\varepsilon}) &= e^{i\,\textcolor{#FFEE58}{\varepsilon} P(0)} U(0)\Longrightarrow \\ &\hspace{1pt}\\ \textcolor{#FD971F}{\Lambda}:\,\, U \longrightarrow U' &:= e^{i\varepsilon P'} U \end{align*}

ε\textcolor{#FFEE58}{\varepsilon} is the step size

  • PP update: dPkdt=Hωk\frac{dP^{k}}{dt} = - \frac{\partial H}{\partial \omega^{k}} dPkdt=Hωk=HQ=dSdQ\frac{dP^{k}}{dt} = - \frac{\partial H}{\partial \omega^{k}} = -\frac{\partial H}{\partial Q} = -\frac{dS}{dQ}\Longrightarrow P(ε)=P(0)εdSdQt=0=P(0)εF[U]Γ:  PP:=Pε2F[U]\begin{align*} P(\textcolor{#FFEE58}{\varepsilon}) &= P(0) - \textcolor{#FFEE58}{\varepsilon} \left.\frac{dS}{dQ}\right|_{t=0} \\ &= P(0) - \textcolor{#FFEE58}{\varepsilon} \,\textcolor{#E599F7}{F[U]} \\ &\hspace{1pt}\\ \textcolor{#F06292}{\Gamma}:\,\, P \longrightarrow P' &:= P - \frac{\varepsilon}{2} F[U] \end{align*}

F[U]\textcolor{#E599F7}{F[U]} is the force term

HMC: 4D SU(3)SU(3)

  • Momentum Update: Γ:PP:=Pε2F[U]\textcolor{#F06292}{\Gamma}: P \longrightarrow P' := P - \frac{\varepsilon}{2} F[U]

  • Link Update: Λ:UU:=eiεPU\textcolor{#FD971F}{\Lambda}: U \longrightarrow U' := e^{i\varepsilon P'} U\quad\quad

  • We maintain a batch of Nb lattices, all updated in parallel

    • UU.dtype = complex128
    • UU.shape
      = [Nb, 4, Nt, Nx, Ny, Nz, 3, 3]

Networks 4D SU(3)SU(3)

 

 

UU-Network:

UNet: (U,P)(sU,tU,qU)(U, P) \longrightarrow \left(s_{U},\, t_{U},\, q_{U}\right)

 

PP-Network:

PNet: (U,P)(sP,tP,qP)(U, P) \longrightarrow \left(s_{P},\, t_{P},\, q_{P}\right)

Networks 4D SU(3)SU(3)

 

 

UU-Network:

UNet: (U,P)(sU,tU,qU)(U, P) \longrightarrow \left(s_{U},\, t_{U},\, q_{U}\right)

 

PP-Network:

PNet: (U,P)(sP,tP,qP)(U, P) \longrightarrow \left(s_{P},\, t_{P},\, q_{P}\right)

\uparrow
let’s look at this

PP-Network (pt. 1)

  • input: (U,F):=(eiQ,F)\hspace{7pt}\left(U, F\right) := (e^{i Q}, F) h0=σ(wQQ+wFF+b)h1=σ(w1h0+b1)hn=σ(wn1hn2+bn)z:=σ(wnhn1+bn)\begin{align*} h_{0} &= \sigma\left( w_{Q} Q + w_{F} F + b \right) \\ h_{1} &= \sigma\left( w_{1} h_{0} + b_{1} \right) \\ &\vdots \\ h_{n} &= \sigma\left(w_{n-1} h_{n-2} + b_{n}\right) \\ \textcolor{#FF5252}{z} & := \sigma\left(w_{n} h_{n-1} + b_{n}\right) \longrightarrow \\ \end{align*}
  • output: (sP,tP,qP)\hspace{7pt} (s_{P}, t_{P}, q_{P})

    • sP=λstanh(wsz+bs)s_{P} = \lambda_{s} \tanh(w_s \textcolor{#FF5252}{z} + b_s)
    • tP=wtz+btt_{P} = w_{t} \textcolor{#FF5252}{z} + b_{t}
    • qP=λqtanh(wqz+bq)q_{P} = \lambda_{q} \tanh(w_{q} \textcolor{#FF5252}{z} + b_{q})

PP-Network (pt. 2)

  • Use (sP,tP,qP)(s_{P}, t_{P}, q_{P}) to update Γ±:(U,P)(U,P±)\Gamma^{\pm}: (U, P) \rightarrow \left(U, P_{\pm}\right):

    • forward (d=+)(d = \textcolor{#FF5252}{+}): Γ+(U,P):=P+=Peε2sPε2[FeεqP+tP]\Gamma^{\textcolor{#FF5252}{+}}(U, P) := P_{\textcolor{#FF5252}{+}} = P \cdot e^{\frac{\varepsilon}{2} s_{P}} - \frac{\varepsilon}{2}\left[ F \cdot e^{\varepsilon q_{P}} + t_{P} \right]

    • backward (d=)(d = \textcolor{#1A8FFF}{-}): Γ(U,P):=P=eε2sP{P+ε2[FeεqP+tP]}\Gamma^{\textcolor{#1A8FFF}{-}}(U, P) := P_{\textcolor{#1A8FFF}{-}} = e^{-\frac{\varepsilon}{2} s_{P}} \left\{P + \frac{\varepsilon}{2}\left[ F \cdot e^{\varepsilon q_{P}} + t_{P} \right]\right\}

Results: 2D U(1)U(1)

Important📈 Improvement

We can measure the performance by comparing τint\tau_{\mathrm{int}} for the trained model vs. HMC.

Note: lower is better

Interpretation

Deviation in xPx_{P}

Topological charge mixing

Artificial influx of energy

Figure 8: Illustration of how different observables evolve over a single L2HMC trajectory.

Interpretation

Average plaquette: xP\langle x_{P}\rangle vs LF step

Average energy: HlogJH - \sum\log|\mathcal{J}|
Figure 9: The trained model artifically increases the energy towards the middle of the trajectory, allowing the sampler to tunnel between isolated sectors.

4D SU(3)SU(3) Results

  • Distribution of logJ\log|\mathcal{J}| over all chains, at each leapfrog step, NLFN_{\mathrm{LF}} (=0,1,,8= 0, 1, \ldots, 8) during training:
Figure 10: 100 train iters
Figure 11: 500 train iters
Figure 12: 1000 train iters

4D SU(3)SU(3) Results: δUμν\delta U_{\mu\nu}

Figure 13: The difference in the average plaquette δUμν2\left|\delta U_{\mu\nu}\right|^{2} between the trained model and HMC

4D SU(3)SU(3) Results: δUμν\delta U_{\mu\nu}

Figure 14: The difference in the average plaquette δUμν2\left|\delta U_{\mu\nu}\right|^{2} between the trained model and HMC

Next Steps

  • Further code development

  • Continue to use / test different network architectures

    • Gauge equivariant NNs for Uμ(x)U_{\mu}(x) update
  • Continue to test different loss functions for training

  • Scaling:

    • Lattice volume
    • Network size
    • Batch size
    • # of GPUs

Thank you!

 

Note🙏 Acknowledgements

This research used resources of the Argonne Leadership Computing Facility,
which is a DOE Office of Science User Facility supported under Contract DE-AC02-06CH11357.

hits l2hmc-qcd codefactor

arxiv arxiv

hydra pyTorch tensorflow Weights & Biases monitoring

Acknowledgements

  • Huge thank you to:
    • Yannick Meurice
    • Norman Christ
    • Akio Tomiya
    • Nobuyuki Matsumoto
    • Richard Brower
    • Luchang Jin
    • Chulwoo Jung
    • Peter Boyle
    • Taku Izubuchi
    • Denis Boyda
    • Dan Hackett
    • ECP-CSD group
    • ALCF Staff + Datascience Group

References

(I don’t know why this is broken 🤷🏻‍♂️ )

Boyda, Denis et al. 2022. Applications of Machine Learning to Lattice Quantum Field Theory.” In Snowmass 2021. https://arxiv.org/abs/2202.05838.
Foreman, Sam, Taku Izubuchi, Luchang Jin, Xiao-Yong Jin, James C. Osborn, and Akio Tomiya. 2022. HMC with Normalizing Flows.” PoS LATTICE2021: 073. https://doi.org/10.22323/1.396.0073.
Foreman, Sam, Xiao-Yong Jin, and James C. Osborn. 2021. Deep Learning Hamiltonian Monte Carlo.” In 9th International Conference on Learning Representations. https://arxiv.org/abs/2105.03418.
———. 2022. LeapfrogLayers: A Trainable Framework for Effective Topological Sampling.” PoS LATTICE2021 (May): 508. https://doi.org/10.22323/1.396.0508.
Shanahan, Phiala et al. 2022. Snowmass 2021 Computational Frontier CompF03 Topical Group Report: Machine Learning,” September. https://arxiv.org/abs/2209.07559.

Extras

Integrated Autocorrelation Time

Figure 15: Plot of the integrated autocorrelation time for both the trained model (colored) and HMC (greyscale).

Comparison

(a) Trained model
(b) Generic HMC
Figure 16: Comparison of δQ=1Ni=kNδQi\langle \delta Q\rangle = \frac{1}{N}\sum_{i=k}^{N} \delta Q_{i} for the trained model vs. HMC

Plaquette analysis: xPx_{P}

Deviation from VV\rightarrow\infty limit, xPx_{P}^{\ast}

Average xP\langle x_{P}\rangle, with xPx_{P}^{\ast} (dotted-lines)

Figure 17: Plot showing how average plaquette, xP\left\langle x_{P}\right\rangle varies over a single trajectory for models trained at different β\beta, with varying trajectory lengths NLFN_{\mathrm{LF}}

Loss Function

  • Want to maximize the expected squared charge difference: Lθ(ξ,ξ)=Ep(ξ)[δQ2(ξ,ξ)A(ξξ)]\begin{equation*} \mathcal{L}_{\theta}\left(\xi^{\ast}, \xi\right) = {\mathbb{E}_{p(\xi)}}\big[-\textcolor{#FA5252}{{\delta Q}}^{2} \left(\xi^{\ast}, \xi \right)\cdot A(\xi^{\ast}|\xi)\big] \end{equation*}

  • Where:

    • δQ\delta Q is the tunneling rate: δQ(ξ,ξ)=QQ\begin{equation*} \textcolor{#FA5252}{\delta Q}(\xi^{\ast},\xi)=\left|Q^{\ast} - Q\right| \end{equation*}

    • A(ξξ)A(\xi^{\ast}|\xi) is the probability of accepting the proposal ξ\xi^{\ast}: A(ξξ)=min(1,p(ξ)p(ξ)ξξT)\begin{equation*} A(\xi^{\ast}|\xi) = \mathrm{min}\left( 1, \frac{p(\xi^{\ast})}{p(\xi)}\left|\frac{\partial \xi^{\ast}}{\partial \xi^{T}}\right|\right) \end{equation*}

Networks 2D U(1)U(1)

  • Stack gauge links as shape(Uμ)\left(U_{\mu}\right)=[Nb, 2, Nt, Nx] C\in \mathbb{C}

    xμ(n)[cos(x),sin(x)] x_{\mu}(n) ≔ \left[\cos(x), \sin(x)\right]

    with shape(xμ)\left(x_{\mu}\right)= [Nb, 2, Nt, Nx, 2] R\in \mathbb{R}

  • xx-Network:

    • ψθ:(x,v)(sx,tx,qx)\psi_{\theta}: (x, v) \longrightarrow \left(s_{x},\, t_{x},\, q_{x}\right)
  • vv-Network:

    • φθ:(x,v)(sv,tv,qv)\varphi_{\theta}: (x, v) \longrightarrow \left(s_{v},\, t_{v},\, q_{v}\right) \hspace{2pt}\longleftarrow lets look at this

vv-Update

  • forward (d=+)(d = \textcolor{#FF5252}{+}):

Γ+:(x,v)v:=veε2svε2[Feεqv+tv]\Gamma^{\textcolor{#FF5252}{+}}: (x, v) \rightarrow v' := v \cdot e^{\frac{\varepsilon}{2} s_{v}} - \frac{\varepsilon}{2}\left[ F \cdot e^{\varepsilon q_{v}} + t_{v} \right]

  • backward (d=)(d = \textcolor{#1A8FFF}{-}):

Γ:(x,v)v:=eε2sv{v+ε2[Feεqv+tv]}\Gamma^{\textcolor{#1A8FFF}{-}}: (x, v) \rightarrow v' := e^{-\frac{\varepsilon}{2} s_{v}} \left\{v + \frac{\varepsilon}{2}\left[ F \cdot e^{\varepsilon q_{v}} + t_{v} \right]\right\}

xx-Update

  • forward (d=+)(d = \textcolor{#FF5252}{+}):

Λ+(x,v)=xeε2sxε2[veεqx+tx]\Lambda^{\textcolor{#FF5252}{+}}(x, v) = x \cdot e^{\frac{\varepsilon}{2} s_{x}} - \frac{\varepsilon}{2}\left[ v \cdot e^{\varepsilon q_{x}} + t_{x} \right]

  • backward (d=)(d = \textcolor{#1A8FFF}{-}):

Λ(x,v)=eε2sx{x+ε2[veεqx+tx]}\Lambda^{\textcolor{#1A8FFF}{-}}(x, v) = e^{-\frac{\varepsilon}{2} s_{x}} \left\{x + \frac{\varepsilon}{2}\left[ v \cdot e^{\varepsilon q_{x}} + t_{x} \right]\right\}

Lattice Gauge Theory (2D U(1)U(1))

Note🔗 Link Variables

Uμ(n)=eixμ(n)C,whereU_{\mu}(n) = e^{i x_{\mu}(n)}\in \mathbb{C},\quad \text{where}\quad xμ(n)[π,π)x_{\mu}(n) \in [-\pi,\pi)

Important🫸 Wilson Action

Sβ(x)=βPcosxP,S_{\beta}(x) = \beta\sum_{P} \cos \textcolor{#00CCFF}{x_{P}},

xP=[xμ(n)+xν(n+μ^)xμ(n+ν^)xν(n)]\textcolor{#00CCFF}{x_{P}} = \left[x_{\mu}(n) + x_{\nu}(n+\hat{\mu}) - x_{\mu}(n+\hat{\nu})-x_{\nu}(n)\right]

Note: xP\textcolor{#00CCFF}{x_{P}} is the product of links around 1×11\times 1 square, called a “plaquette”

2D Lattice

2D Lattice

Figure 18: Jupyter Notebook

Annealing Schedule

  • Introduce an annealing schedule during the training phase:

    {γt}t=0N={γ0,γ1,,γN1,γN}\left\{ \gamma_{t} \right\}_{t=0}^{N} = \left\{\gamma_{0}, \gamma_{1}, \ldots, \gamma_{N-1}, \gamma_{N} \right\}

    where γ0<γ1<<γN1\gamma_{0} < \gamma_{1} < \cdots < \gamma_{N} \equiv 1, and γt+1γt1\left|\gamma_{t+1} - \gamma_{t}\right| \ll 1

  • Note:

    • for γt<1\left|\gamma_{t}\right| < 1, this rescaling helps to reduce the height of the energy barriers \Longrightarrow
    • easier for our sampler to explore previously inaccessible regions of the phase space

Networks 2D U(1)U(1)

  • Stack gauge links as shape(Uμ)\left(U_{\mu}\right)=[Nb, 2, Nt, Nx] C\in \mathbb{C}

    xμ(n)[cos(x),sin(x)] x_{\mu}(n) ≔ \left[\cos(x), \sin(x)\right]

    with shape(xμ)\left(x_{\mu}\right)= [Nb, 2, Nt, Nx, 2] R\in \mathbb{R}

  • xx-Network:

    • ψθ:(x,v)(sx,tx,qx)\psi_{\theta}: (x, v) \longrightarrow \left(s_{x},\, t_{x},\, q_{x}\right)
  • vv-Network:

    • φθ:(x,v)(sv,tv,qv)\varphi_{\theta}: (x, v) \longrightarrow \left(s_{v},\, t_{v},\, q_{v}\right)

Toy Example: GMM R2\in \mathbb{R}^{2}

Figure 19

Physical Quantities

  • To estimate physical quantities, we:
    • Calculate physical observables at increasing spatial resolution
    • Perform extrapolation to continuum limit
Figure 20: Increasing the physical resolution (a0a \rightarrow 0) allows us to make predictions about numerical values of physical quantities in the continuum limit.

Extra

Footnotes

  1. Here, \sim means “is distributed according to”↩︎

  2. Here, \sim means “is distributed according to”↩︎

  3. We always start by resampling the momentum, v0N(0,1)v_{0} \sim \mathcal{N}(0, \mathbb{1})↩︎

  4. L2HMC: (, )↩︎

  5. For simple xR2\mathbf{x} \in \mathbb{R}^{2} example, Lθ=A(ξξ)(xx)2\mathcal{L}_{\theta} = A(\xi^{\ast}|\xi)\cdot \left(\mathbf{x}^{\ast} - \mathbf{x}\right)^{2}↩︎

  6. σ()\sigma(\cdot) denotes an activation function↩︎

  7. λs,λqR\lambda_{s}, \lambda_{q} \in \mathbb{R}, trainable parameters↩︎

  8. Note that (Γ+)1=Γ\left(\Gamma^{+}\right)^{-1} = \Gamma^{-}, i.e. Γ+[Γ(U,P)]=Γ[Γ+(U,P)]=(U,P)\Gamma^{+}\left[\Gamma^{-}(U, P)\right] = \Gamma^{-}\left[\Gamma^{+}(U, P)\right] = (U, P)↩︎

  9. Where ξ\xi^{\ast} is the proposed configuration (prior to Accept / Reject)↩︎

  10. And ξξT\left|\frac{\partial \xi^{\ast}}{\partial \xi^{T}}\right| is the Jacobian of the transformation from ξξ\xi \rightarrow \xi^{\ast}↩︎

  11. Note that (Γ+)1=Γ\left(\Gamma^{+}\right)^{-1} = \Gamma^{-}, i.e. Γ+[Γ(x,v)]=Γ[Γ+(x,v)]=(x,v)\Gamma^{+}\left[\Gamma^{-}(x, v)\right] = \Gamma^{-}\left[\Gamma^{+}(x, v)\right] = (x, v)↩︎

Citation

For attribution, please cite this work as:
Foreman, Sam. 2023. “MLMC: Machine Learning Monte Carlo for Lattice Gauge Theory.” Https://Indico.fnal.gov/Event/57249/Contributions/271305/. Presentation at the 2023 International Symposium on Lattice Field Theory, July 31. https://saforem2.github.io/lattice23.