MLMC: Machine Learning Monte Carlo

Author
Affiliation
Published

July 31, 2023

Modified

June 17, 2025

MLMC: Machine Learning Monte Carlo
for Lattice Gauge Theory

 

Sam Foreman
Xiao-Yong Jin, James C. Osborn
saforem2/{lattice23, l2hmc-qcd}

Overview

  1. Background: {MCMC,HMC}
  2. L2HMC: Generalizing MD
  3. References
  4. Extras

Markov Chain Monte Carlo (MCMC)

Note๐ŸŽฏ Goal

Generate independent samples {xi}\{x_{i}\}, such that {xi}โˆผp(x)โˆeโˆ’S(x)\{x_{i}\} \sim p(x) \propto e^{-S(x)} where S(x)S(x) is the action (or potential energy)

  • Want to calculate observables O\mathcal{O}:
    โŸจOโŸฉโˆโˆซ[Dx]O(x)โ€‰p(x)\left\langle \mathcal{O}\right\rangle \propto \int \left[\mathcal{D}x\right]\hspace{4pt} {\mathcal{O}(x)\, p(x)}

If these were independent, we could approximate: โŸจOโŸฉโ‰ƒ1Nโˆ‘n=1NO(xn)\left\langle\mathcal{O}\right\rangle \simeq \frac{1}{N}\sum^{N}_{n=1}\mathcal{O}(x_{n})
ฯƒO2=1NVar[O(x)]โŸนฯƒOโˆ1N\sigma_{\mathcal{O}}^{2} = \frac{1}{N}\mathrm{Var}{\left[\mathcal{O} (x) \right]}\Longrightarrow \sigma_{\mathcal{O}} \propto \frac{1}{\sqrt{N}}

Markov Chain Monte Carlo (MCMC)

Note๐ŸŽฏ Goal

Generate independent samples {xi}\{x_{i}\}, such that {xi}โˆผp(x)โˆeโˆ’S(x)\{x_{i}\} \sim p(x) \propto e^{-S(x)} where S(x)S(x) is the action (or potential energy)

  • Want to calculate observables O\mathcal{O}:
    โŸจOโŸฉโˆโˆซ[Dx]O(x)โ€‰p(x)\left\langle \mathcal{O}\right\rangle \propto \int \left[\mathcal{D}x\right]\hspace{4pt} {\mathcal{O}(x)\, p(x)}

Instead, nearby configs are correlated, and we incur a factor of ฯ„intO\textcolor{#FF5252}{\tau^{\mathcal{O}}_{\mathrm{int}}}: ฯƒO2=ฯ„intONVar[O(x)]\sigma_{\mathcal{O}}^{2} = \frac{\textcolor{#FF5252}{\tau^{\mathcal{O}}_{\mathrm{int}}}}{N}\mathrm{Var}{\left[\mathcal{O} (x) \right]}

Hamiltonian Monte Carlo (HMC)

  • Want to (sequentially) construct a chain of states: x0โ†’x1โ†’xiโ†’โ‹ฏโ†’xNx_{0} \rightarrow x_{1} \rightarrow x_{i} \rightarrow \cdots \rightarrow x_{N}\hspace{10pt}

    such that, as Nโ†’โˆžN \rightarrow \infty: {xi,xi+1,xi+2,โ‹ฏโ€‰,xN}โ†’Nโ†’โˆžp(x)โˆeโˆ’S(x)\left\{x_{i}, x_{i+1}, x_{i+2}, \cdots, x_{N} \right\} \xrightarrow[]{N\rightarrow\infty} p(x) \propto e^{-S(x)}

Tip๐Ÿช„ Trick
  • Introduce fictitious momentum vโˆผN(0,1)v \sim \mathcal{N}(0, \mathbb{1})
    • Normally distributed independent of xx, i.e. p(x,v)=p(x)โ€‰p(v)โˆeโˆ’S(x)eโˆ’12vTv=eโˆ’[S(x)+12vTv]=eโˆ’H(x,v)\begin{align*} p(x, v) &\textcolor{#02b875}{=} p(x)\,p(v) \propto e^{-S{(x)}} e^{-\frac{1}{2} v^{T}v} = e^{-\left[S(x) + \frac{1}{2} v^{T}{v}\right]} \textcolor{#02b875}{=} e^{-H(x, v)} \end{align*}

Hamiltonian Monte Carlo (HMC)

  • Idea: Evolve the (xห™,vห™)(\dot{x}, \dot{v}) system to get new states {xi}\{x_{i}\}โ—

  • Write the joint distribution p(x,v)p(x, v): p(x,v)โˆeโˆ’S[x]eโˆ’12vTv=eโˆ’H(x,v) p(x, v) \propto e^{-S[x]} e^{-\frac{1}{2}v^{T} v} = e^{-H(x, v)}

Tip๐Ÿ”‹ Hamiltonian Dynamics

H=S[x]+12vTvโŸนH = S[x] + \frac{1}{2} v^{T} v \Longrightarrow xห™=+โˆ‚vH,โ€‰โ€‰vห™=โˆ’โˆ‚xH\dot{x} = +\partial_{v} H, \,\,\dot{v} = -\partial_{x} H

Figure 1: Overview of HMC algorithm

Leapfrog Integrator (HMC)

Tip๐Ÿ”‹ Hamiltonian Dynamics

(xห™,vห™)=(โˆ‚vH,โˆ’โˆ‚xH)\left(\dot{x}, \dot{v}\right) = \left(\partial_{v} H, -\partial_{x} H\right)

Note๐Ÿธ Leapfrog Step

input โ€‰(x,v)โ†’(xโ€ฒ,vโ€ฒ)โ€‰\,\left(x, v\right) \rightarrow \left(x', v'\right)\, output

v~:=ฮ“(x,v)=vโˆ’ฮต2โˆ‚xS(x)xโ€ฒ:=ฮ›(x,v~)โ€‰=x+ฮตโ€‰v~vโ€ฒ:=ฮ“(xโ€ฒ,v~)=v~โˆ’ฮต2โˆ‚xS(xโ€ฒ)\begin{align*} \tilde{v} &:= \textcolor{#F06292}{\Gamma}(x, v)\hspace{2.2pt} = v - \frac{\varepsilon}{2} \partial_{x} S(x) \\ x' &:= \textcolor{#FD971F}{\Lambda}(x, \tilde{v}) \, = x + \varepsilon \, \tilde{v} \\ v' &:= \textcolor{#F06292}{\Gamma}(x', \tilde{v}) = \tilde{v} - \frac{\varepsilon}{2} \partial_{x} S(x') \end{align*}

Warningโš ๏ธ Warning!
  • Resample v0โˆผN(0,1)v_{0} \sim \mathcal{N}(0, \mathbb{1})
    at the beginning of each trajectory

Note: โˆ‚xS(x)\partial_{x} S(x) is the force

HMC Update

  • We build a trajectory of NLFN_{\mathrm{LF}} leapfrog steps (x0,v0)โ†’(x1,v1)โ†’โ‹ฏโ†’(xโ€ฒ,vโ€ฒ)\begin{equation*} (x_{0}, v_{0})% \rightarrow (x_{1}, v_{1})\rightarrow \cdots% \rightarrow (x', v') \end{equation*}

  • And propose xโ€ฒx' as the next state in our chain

ฮ“:(x,v)โ†’vโ€ฒ:=vโˆ’ฮต2โˆ‚xS(x)ฮ›:(x,v)โ†’xโ€ฒ:=x+ฮตv\begin{align*} \textcolor{#F06292}{\Gamma}: (x, v) \textcolor{#F06292}{\rightarrow} v' &:= v - \frac{\varepsilon}{2} \partial_{x} S(x) \\ \textcolor{#FD971F}{\Lambda}: (x, v) \textcolor{#FD971F}{\rightarrow} x' &:= x + \varepsilon v \end{align*}

  • We then accept / reject xโ€ฒx' using Metropolis-Hastings criteria,
    A(xโ€ฒโˆฃx)=minโก{1,p(xโ€ฒ)p(x)โˆฃโˆ‚xโ€ฒโˆ‚xโˆฃ}A(x'|x) = \min\left\{1, \frac{p(x')}{p(x)}\left|\frac{\partial x'}{\partial x}\right|\right\}

HMC Demo

Figure 2: HMC Demo

Issues with HMC

  • What do we want in a good sampler?
    • Fast mixing (small autocorrelations)
    • Fast burn-in (quick convergence)
  • Problems with HMC:
    • Energy levels selected randomly โ†’\rightarrow slow mixing
    • Cannot easily traverse low-density zones โ†’\rightarrow slow convergence

HMC Samples with ฮต=0.25\varepsilon=0.25

HMC Samples with ฮต=0.5\varepsilon=0.5
Figure 3: HMC Samples generated with varying step sizes ฮต\varepsilon

Topological Freezing

Topological Charge: Q=12ฯ€โˆ‘PโŒŠxPโŒ‹โˆˆZQ = \frac{1}{2\pi}\sum_{P}\left\lfloor x_{P}\right\rfloor \in \mathbb{Z}

note: โŒŠxPโŒ‹=xPโˆ’2ฯ€โŒŠxP+ฯ€2ฯ€โŒ‹\left\lfloor x_{P} \right\rfloor = x_{P} - 2\pi \left\lfloor\frac{x_{P} + \pi}{2\pi}\right\rfloor

Importantโณ Critical Slowing Down
  • QQ gets stuck!
    • as ฮฒโŸถโˆž\beta\longrightarrow \infty:
      • QโŸถconst.Q \longrightarrow \text{const.}
      • ฮดQ=(Qโˆ—โˆ’Q)โ†’0โŸน\delta Q = \left(Q^{\ast} - Q\right) \rightarrow 0 \textcolor{#FF5252}{\Longrightarrow}
    • # configs required to estimate errors
      grows exponentially: ฯ„intQโŸถโˆž\tau_{\mathrm{int}}^{Q} \longrightarrow \infty

Note \delta Q \rightarrow 0 at increasing \beta

Note ฮดQโ†’0\delta Q \rightarrow 0 at increasing ฮฒ\beta

Can we do better?

  • Introduce two (invertible NNs) vNet and xNet:
    • vNet: (x,F)โŸถ(sv,โ€‰tv,โ€‰qv)(x, F) \longrightarrow \left(s_{v},\, t_{v},\, q_{v}\right)
    • xNet: (x,v)โŸถ(sx,โ€‰tx,โ€‰qx)(x, v) \longrightarrow \left(s_{x},\, t_{x},\, q_{x}\right)

 

  • Use these (s,t,q)(s, t, q) in the generalized MD update:
    • ฮ“ฮธยฑ\Gamma_{\theta}^{\pm} :(x,v)โ†’sv,tv,qv(x,vโ€ฒ): ({x}, \textcolor{#07B875}{v}) \xrightarrow[]{\textcolor{#F06292}{s_{v}, t_{v}, q_{v}}} (x, \textcolor{#07B875}{v'})
    • ฮ›ฮธยฑ\Lambda_{\theta}^{\pm} :(x,v)โ†’sx,tx,qx(xโ€ฒ,v): (\textcolor{#AE81FF}{x}, v) \xrightarrow[]{\textcolor{#FD971F}{s_{x}, t_{x}, q_{x}}} (\textcolor{#AE81FF}{x'}, v)
Figure 4: Generalized MD update where ฮ›ฮธยฑ\Lambda_{\theta}^{\pm}, ฮ“ฮธยฑ\Gamma_{\theta}^{\pm} are invertible NNs

L2HMC: Generalizing the MD Update

  • Introduce dโˆผU(ยฑ)d \sim \mathcal{U}(\pm) to determine the direction of our update

    1. vโ€ฒ=\textcolor{#07B875}{v'} = ฮ“ยฑ\Gamma^{\pm}(x,v)({x}, \textcolor{#07B875}{v}) \hspace{46pt} update vv

    2. xโ€ฒ=\textcolor{#AE81FF}{x'} = xBx_{B}โ€‰+โ€‰\,+\,ฮ›ยฑ\Lambda^{\pm}((xAx_{A},vโ€ฒ), {v'}) \hspace{10pt} update first half: xAx_{A}

    3. xโ€ฒโ€ฒ=\textcolor{#AE81FF}{x''} = xAโ€ฒx'_{A}โ€‰+โ€‰\,+\,ฮ›ยฑ\Lambda^{\pm}((xBโ€ฒx'_{B},vโ€ฒ), {v'}) \hspace{8pt} update other half: xBx_{B}

    4. vโ€ฒโ€ฒ=\textcolor{#07B875}{v''} = ฮ“ยฑ\Gamma^{\pm}(xโ€ฒโ€ฒ,vโ€ฒ)({x''}, \textcolor{#07B875}{v'}) \hspace{36pt} update vv

  • Resample both vโˆผN(0,1)v\sim \mathcal{N}(0, 1), and dโˆผU(ยฑ)d \sim \mathcal{U}(\pm) at the beginning of each trajectory
    • To ensure ergodicity + reversibility, we split the xx update into sequential (complementary) updates
  • Introduce directional variable dโˆผU(ยฑ)d \sim \mathcal{U}(\pm), resampled at the beginning of each trajectory:
    • Note that (ฮ“+)โˆ’1=ฮ“โˆ’\left(\Gamma^{+}\right)^{-1} = \Gamma^{-}, i.e. ฮ“+[ฮ“โˆ’(x,v)]=ฮ“โˆ’[ฮ“+(x,v)]=(x,v)\Gamma^{+}\left[\Gamma^{-}(x, v)\right] = \Gamma^{-}\left[\Gamma^{+}(x, v)\right] = (x, v)
Figure 5: Generalized MD update with ฮ›ฮธยฑ\Lambda_{\theta}^{\pm}, ฮ“ฮธยฑ\Gamma_{\theta}^{\pm} invertible NNs

L2HMC: Leapfrog Layer

L2HMC Update

Important๐Ÿ‘จโ€๐Ÿ’ป Algorithm
  1. input: xx

    • Resample: vโˆผN(0,1)\textcolor{#07B875}{v} \sim \mathcal{N}(0, \mathbb{1}); โ€‰โ€‰dโˆผU(ยฑ)\,\,{d\sim\mathcal{U}(\pm)}
    • Construct initial state: ฮพ=(x,v,ยฑ)\textcolor{#939393}{\xi} =(\textcolor{#AE81FF}{x}, \textcolor{#07B875}{v}, {\pm})
  2. forward: Generate proposal ฮพโ€ฒ\xi' by passing initial ฮพ\xi through NLFN_{\mathrm{LF}} leapfrog layers
    ฮพโ†’LF layerฮพ1โŸถโ‹ฏโŸถฮพNLF=ฮพโ€ฒ:=(xโ€ฒโ€ฒ,vโ€ฒโ€ฒ)\textcolor{#939393} \xi \hspace{1pt}\xrightarrow[]{\tiny{\mathrm{LF} \text{ layer}}}\xi_{1} \longrightarrow\cdots \longrightarrow \xi_{N_{\mathrm{LF}}} = \textcolor{#f8f8f8}{\xi'} := (\textcolor{#AE81FF}{x''}, \textcolor{#07B875}{v''})

    • Accept / Reject: A(ฮพโ€ฒโˆฃฮพ)=min{1,ฯ€(ฮพโ€ฒ)ฯ€(ฮพ)โˆฃJ(ฮพโ€ฒ,ฮพ)โˆฃ}\begin{equation*} A({\textcolor{#f8f8f8}{\xi'}}|{\textcolor{#939393}{\xi}})= \mathrm{min}\left\{1, \frac{\pi(\textcolor{#f8f8f8}{\xi'})}{\pi(\textcolor{#939393}{\xi})} \left| \mathcal{J}\left(\textcolor{#f8f8f8}{\xi'},\textcolor{#939393}{\xi}\right)\right| \right\} \end{equation*}
  3. backward (if training):

    • Evaluate the loss function Lโ†Lฮธ(ฮพโ€ฒ,ฮพ)\mathcal{L}\gets \mathcal{L}_{\theta}(\textcolor{#f8f8f8}{\xi'}, \textcolor{#939393}{\xi}) and backprop
  4. return: xi+1\textcolor{#AE81FF}{x}_{i+1}
    Evaluate MH criteria (1)(1) and return accepted config, xi+1โ†{xโ€ฒโ€ฒ w/ prob A(ฮพโ€ฒโ€ฒโˆฃฮพ)โœ…x w/ prob 1โˆ’A(ฮพโ€ฒโ€ฒโˆฃฮพ)๐Ÿšซ\textcolor{#AE81FF}{{x}_{i+1}}\gets \begin{cases} \textcolor{#f8f8f8}{\textcolor{#AE81FF}{x''}} \small{\text{ w/ prob }} A(\textcolor{#f8f8f8}{\xi''}|\textcolor{#939393}{\xi}) \hspace{26pt} โœ… \\ \textcolor{#939393}{\textcolor{#AE81FF}{x}} \hspace{5pt}\small{\text{ w/ prob }} 1 - A(\textcolor{#f8f8f8}{\xi''}|{\textcolor{#939393}{\xi}}) \hspace{10pt} ๐Ÿšซ \end{cases}

Figure 6: Leapfrog Layer used in generalized MD update

4D SU(3)SU(3) Model

Note๐Ÿ”— Link Variables
  • Write link variables Uฮผ(x)โˆˆSU(3)U_{\mu}(x) \in SU(3):

    Uฮผ(x)=exp[iโ€‰ฯ‰ฮผk(x)ฮปk]=eiQ,withQโˆˆsu(3) \begin{align*} U_{\mu}(x) &= \mathrm{exp}\left[{i\, \textcolor{#AE81FF}{\omega^{k}_{\mu}(x)} \lambda^{k}}\right]\\ &= e^{i \textcolor{#AE81FF}{Q}},\quad \text{with} \quad \textcolor{#AE81FF}{Q} \in \mathfrak{su}(3) \end{align*}

    where ฯ‰ฮผk(x)\omega^{k}_{\mu}(x) โˆˆR\in \mathbb{R}, and ฮปk\lambda^{k} are the generators of SU(3)SU(3)

Tip๐Ÿƒโ€โ™‚๏ธโ€โžก๏ธ Conjugate Momenta
  • Introduce Pฮผ(x)=Pฮผk(x)ฮปkP_{\mu}(x) = P^{k}_{\mu}(x) \lambda^{k} conjugate to ฯ‰ฮผk(x)\omega^{k}_{\mu}(x)
Important๐ŸŸฅ Wilson Action

SG=โˆ’ฮฒ6โˆ‘Tr[Uฮผฮฝ(x)+Uฮผฮฝโ€ (x)] S_{G} = -\frac{\beta}{6} \sum \mathrm{Tr}\left[U_{\mu\nu}(x) + U^{\dagger}_{\mu\nu}(x)\right]

where Uฮผฮฝ(x)=Uฮผ(x)Uฮฝ(x+ฮผ^)Uฮผโ€ (x+ฮฝ^)Uฮฝโ€ (x)U_{\mu\nu}(x) = U_{\mu}(x) U_{\nu}(x+\hat{\mu}) U^{\dagger}_{\mu}(x+\hat{\nu}) U^{\dagger}_{\nu}(x)

Figure 7: Illustration of the lattice

HMC: 4D SU(3)SU(3)

Hamiltonian: H[P,U]=12P2+S[U]โŸนH[P, U] = \frac{1}{2} P^{2} + S[U] \Longrightarrow

  • UU update: dฯ‰kdt=โˆ‚Hโˆ‚Pk\frac{d\omega^{k}}{dt} = \frac{\partial H}{\partial P^{k}} dฯ‰kdtฮปk=PkฮปkโŸนdQdt=P\frac{d\omega^{k}}{dt}\lambda^{k} = P^{k}\lambda^{k} \Longrightarrow \frac{dQ}{dt} = P Q(ฮต)=Q(0)+ฮตP(0)โŸนโˆ’iโ€‰logโกU(ฮต)=โˆ’iโ€‰logโกU(0)+ฮตP(0)U(ฮต)=eiโ€‰ฮตP(0)U(0)โŸนฮ›:โ€‰โ€‰UโŸถUโ€ฒ:=eiฮตPโ€ฒU\begin{align*} Q(\textcolor{#FFEE58}{\varepsilon}) &= Q(0) + \textcolor{#FFEE58}{\varepsilon} P(0)\Longrightarrow\\ -i\, \log U(\textcolor{#FFEE58}{\varepsilon}) &= -i\, \log U(0) + \textcolor{#FFEE58}{\varepsilon} P(0) \\ U(\textcolor{#FFEE58}{\varepsilon}) &= e^{i\,\textcolor{#FFEE58}{\varepsilon} P(0)} U(0)\Longrightarrow \\ &\hspace{1pt}\\ \textcolor{#FD971F}{\Lambda}:\,\, U \longrightarrow U' &:= e^{i\varepsilon P'} U \end{align*}

ฮต\textcolor{#FFEE58}{\varepsilon} is the step size

  • PP update: dPkdt=โˆ’โˆ‚Hโˆ‚ฯ‰k\frac{dP^{k}}{dt} = - \frac{\partial H}{\partial \omega^{k}} dPkdt=โˆ’โˆ‚Hโˆ‚ฯ‰k=โˆ’โˆ‚Hโˆ‚Q=โˆ’dSdQโŸน\frac{dP^{k}}{dt} = - \frac{\partial H}{\partial \omega^{k}} = -\frac{\partial H}{\partial Q} = -\frac{dS}{dQ}\Longrightarrow P(ฮต)=P(0)โˆ’ฮตdSdQโˆฃt=0=P(0)โˆ’ฮตโ€‰F[U]ฮ“:โ€‰โ€‰PโŸถPโ€ฒ:=Pโˆ’ฮต2F[U]\begin{align*} P(\textcolor{#FFEE58}{\varepsilon}) &= P(0) - \textcolor{#FFEE58}{\varepsilon} \left.\frac{dS}{dQ}\right|_{t=0} \\ &= P(0) - \textcolor{#FFEE58}{\varepsilon} \,\textcolor{#E599F7}{F[U]} \\ &\hspace{1pt}\\ \textcolor{#F06292}{\Gamma}:\,\, P \longrightarrow P' &:= P - \frac{\varepsilon}{2} F[U] \end{align*}

F[U]\textcolor{#E599F7}{F[U]} is the force term

HMC: 4D SU(3)SU(3)

  • Momentum Update: ฮ“:PโŸถPโ€ฒ:=Pโˆ’ฮต2F[U]\textcolor{#F06292}{\Gamma}: P \longrightarrow P' := P - \frac{\varepsilon}{2} F[U]

  • Link Update: ฮ›:UโŸถUโ€ฒ:=eiฮตPโ€ฒU\textcolor{#FD971F}{\Lambda}: U \longrightarrow U' := e^{i\varepsilon P'} U\quad\quad

  • We maintain a batch of Nb lattices, all updated in parallel

    • UU.dtype = complex128
    • UU.shape
      = [Nb, 4, Nt, Nx, Ny, Nz, 3, 3]

Networks 4D SU(3)SU(3)

 

 

UU-Network:

UNet: (U,P)โŸถ(sU,โ€‰tU,โ€‰qU)(U, P) \longrightarrow \left(s_{U},\, t_{U},\, q_{U}\right)

 

PP-Network:

PNet: (U,P)โŸถ(sP,โ€‰tP,โ€‰qP)(U, P) \longrightarrow \left(s_{P},\, t_{P},\, q_{P}\right)

Networks 4D SU(3)SU(3)

 

 

UU-Network:

UNet: (U,P)โŸถ(sU,โ€‰tU,โ€‰qU)(U, P) \longrightarrow \left(s_{U},\, t_{U},\, q_{U}\right)

 

PP-Network:

PNet: (U,P)โŸถ(sP,โ€‰tP,โ€‰qP)(U, P) \longrightarrow \left(s_{P},\, t_{P},\, q_{P}\right)

โ†‘\uparrow
letโ€™s look at this

PP-Network (pt. 1)

  • input: (U,F):=(eiQ,F)\hspace{7pt}\left(U, F\right) := (e^{i Q}, F) h0=ฯƒ(wQQ+wFF+b)h1=ฯƒ(w1h0+b1)โ‹ฎhn=ฯƒ(wnโˆ’1hnโˆ’2+bn)z:=ฯƒ(wnhnโˆ’1+bn)โŸถ\begin{align*} h_{0} &= \sigma\left( w_{Q} Q + w_{F} F + b \right) \\ h_{1} &= \sigma\left( w_{1} h_{0} + b_{1} \right) \\ &\vdots \\ h_{n} &= \sigma\left(w_{n-1} h_{n-2} + b_{n}\right) \\ \textcolor{#FF5252}{z} & := \sigma\left(w_{n} h_{n-1} + b_{n}\right) \longrightarrow \\ \end{align*}
  • output: (sP,tP,qP)\hspace{7pt} (s_{P}, t_{P}, q_{P})

    • sP=ฮปstanhโก(wsz+bs)s_{P} = \lambda_{s} \tanh(w_s \textcolor{#FF5252}{z} + b_s)
    • tP=wtz+btt_{P} = w_{t} \textcolor{#FF5252}{z} + b_{t}
    • qP=ฮปqtanhโก(wqz+bq)q_{P} = \lambda_{q} \tanh(w_{q} \textcolor{#FF5252}{z} + b_{q})

PP-Network (pt. 2)

  • Use (sP,tP,qP)(s_{P}, t_{P}, q_{P}) to update ฮ“ยฑ:(U,P)โ†’(U,Pยฑ)\Gamma^{\pm}: (U, P) \rightarrow \left(U, P_{\pm}\right):

    • forward (d=+)(d = \textcolor{#FF5252}{+}): ฮ“+(U,P):=P+=Pโ‹…eฮต2sPโˆ’ฮต2[Fโ‹…eฮตqP+tP]\Gamma^{\textcolor{#FF5252}{+}}(U, P) := P_{\textcolor{#FF5252}{+}} = P \cdot e^{\frac{\varepsilon}{2} s_{P}} - \frac{\varepsilon}{2}\left[ F \cdot e^{\varepsilon q_{P}} + t_{P} \right]

    • backward (d=โˆ’)(d = \textcolor{#1A8FFF}{-}): ฮ“โˆ’(U,P):=Pโˆ’=eโˆ’ฮต2sP{P+ฮต2[Fโ‹…eฮตqP+tP]}\Gamma^{\textcolor{#1A8FFF}{-}}(U, P) := P_{\textcolor{#1A8FFF}{-}} = e^{-\frac{\varepsilon}{2} s_{P}} \left\{P + \frac{\varepsilon}{2}\left[ F \cdot e^{\varepsilon q_{P}} + t_{P} \right]\right\}

Results: 2D U(1)U(1)

Important๐Ÿ“ˆ Improvement

We can measure the performance by comparing ฯ„int\tau_{\mathrm{int}} for the trained model vs. HMC.

Note: lower is better

Interpretation

Deviation in xPx_{P}

Topological charge mixing

Artificial influx of energy

Figure 8: Illustration of how different observables evolve over a single L2HMC trajectory.

Interpretation

Average plaquette: โŸจxPโŸฉ\langle x_{P}\rangle vs LF step

Average energy: Hโˆ’โˆ‘logโกโˆฃJโˆฃH - \sum\log|\mathcal{J}|
Figure 9: The trained model artifically increases the energy towards the middle of the trajectory, allowing the sampler to tunnel between isolated sectors.

4D SU(3)SU(3) Results

  • Distribution of logโกโˆฃJโˆฃ\log|\mathcal{J}| over all chains, at each leapfrog step, NLFN_{\mathrm{LF}} (=0,1,โ€ฆ,8= 0, 1, \ldots, 8) during training:
Figure 10: 100 train iters
Figure 11: 500 train iters
Figure 12: 1000 train iters

4D SU(3)SU(3) Results: ฮดUฮผฮฝ\delta U_{\mu\nu}

Figure 13: The difference in the average plaquette โˆฃฮดUฮผฮฝโˆฃ2\left|\delta U_{\mu\nu}\right|^{2} between the trained model and HMC

4D SU(3)SU(3) Results: ฮดUฮผฮฝ\delta U_{\mu\nu}

Figure 14: The difference in the average plaquette โˆฃฮดUฮผฮฝโˆฃ2\left|\delta U_{\mu\nu}\right|^{2} between the trained model and HMC

Next Steps

  • Further code development

  • Continue to use / test different network architectures

    • Gauge equivariant NNs for Uฮผ(x)U_{\mu}(x) update
  • Continue to test different loss functions for training

  • Scaling:

    • Lattice volume
    • Network size
    • Batch size
    • # of GPUs

Thank you!

 

Note๐Ÿ™ Acknowledgements

This research used resources of the Argonne Leadership Computing Facility,
which is a DOE Office of Science User Facility supported under Contract DE-AC02-06CH11357.

hits l2hmc-qcd codefactor

arxiv arxiv

hydra pyTorch tensorflow Weights & Biases monitoring

Acknowledgements

  • Huge thank you to:
    • Yannick Meurice
    • Norman Christ
    • Akio Tomiya
    • Nobuyuki Matsumoto
    • Richard Brower
    • Luchang Jin
    • Chulwoo Jung
    • Peter Boyle
    • Taku Izubuchi
    • Denis Boyda
    • Dan Hackett
    • ECP-CSD group
    • ALCF Staff + Datascience Group

References

(I donโ€™t know why this is broken ๐Ÿคท๐Ÿปโ€โ™‚๏ธ )

Boyda, Denis et al. 2022. โ€œApplications of Machine Learning to Lattice Quantum Field Theory.โ€ In Snowmass 2021. https://arxiv.org/abs/2202.05838.
Foreman, Sam, Taku Izubuchi, Luchang Jin, Xiao-Yong Jin, James C. Osborn, and Akio Tomiya. 2022. โ€œHMC with Normalizing Flows.โ€ PoS LATTICE2021: 073. https://doi.org/10.22323/1.396.0073.
Foreman, Sam, Xiao-Yong Jin, and James C. Osborn. 2021. โ€œDeep Learning Hamiltonian Monte Carlo.โ€ In 9th International Conference on Learning Representations. https://arxiv.org/abs/2105.03418.
โ€”โ€”โ€”. 2022. โ€œLeapfrogLayers: A Trainable Framework for Effective Topological Sampling.โ€ PoS LATTICE2021 (May): 508. https://doi.org/10.22323/1.396.0508.
Shanahan, Phiala et al. 2022. โ€œSnowmass 2021 Computational Frontier CompF03 Topical Group Report: Machine Learning,โ€ September. https://arxiv.org/abs/2209.07559.

Extras

Integrated Autocorrelation Time

Figure 15: Plot of the integrated autocorrelation time for both the trained model (colored) and HMC (greyscale).

Comparison

(a) Trained model
(b) Generic HMC
Figure 16: Comparison of โŸจฮดQโŸฉ=1Nโˆ‘i=kNฮดQi\langle \delta Q\rangle = \frac{1}{N}\sum_{i=k}^{N} \delta Q_{i} for the trained model vs. HMC

Plaquette analysis: xPx_{P}

Deviation from Vโ†’โˆžV\rightarrow\infty limit, xPโˆ—x_{P}^{\ast}

Average โŸจxPโŸฉ\langle x_{P}\rangle, with xPโˆ—x_{P}^{\ast} (dotted-lines)

Figure 17: Plot showing how average plaquette, โŸจxPโŸฉ\left\langle x_{P}\right\rangle varies over a single trajectory for models trained at different ฮฒ\beta, with varying trajectory lengths NLFN_{\mathrm{LF}}

Loss Function

  • Want to maximize the expected squared charge difference: Lฮธ(ฮพโˆ—,ฮพ)=Ep(ฮพ)[โˆ’ฮดQ2(ฮพโˆ—,ฮพ)โ‹…A(ฮพโˆ—โˆฃฮพ)]\begin{equation*} \mathcal{L}_{\theta}\left(\xi^{\ast}, \xi\right) = {\mathbb{E}_{p(\xi)}}\big[-\textcolor{#FA5252}{{\delta Q}}^{2} \left(\xi^{\ast}, \xi \right)\cdot A(\xi^{\ast}|\xi)\big] \end{equation*}

  • Where:

    • ฮดQ\delta Q is the tunneling rate: ฮดQ(ฮพโˆ—,ฮพ)=โˆฃQโˆ—โˆ’Qโˆฃ\begin{equation*} \textcolor{#FA5252}{\delta Q}(\xi^{\ast},\xi)=\left|Q^{\ast} - Q\right| \end{equation*}

    • A(ฮพโˆ—โˆฃฮพ)A(\xi^{\ast}|\xi) is the probability of accepting the proposal ฮพโˆ—\xi^{\ast}: A(ฮพโˆ—โˆฃฮพ)=min(1,p(ฮพโˆ—)p(ฮพ)โˆฃโˆ‚ฮพโˆ—โˆ‚ฮพTโˆฃ)\begin{equation*} A(\xi^{\ast}|\xi) = \mathrm{min}\left( 1, \frac{p(\xi^{\ast})}{p(\xi)}\left|\frac{\partial \xi^{\ast}}{\partial \xi^{T}}\right|\right) \end{equation*}

Networks 2D U(1)U(1)

  • Stack gauge links as shape(Uฮผ)\left(U_{\mu}\right)=[Nb, 2, Nt, Nx] โˆˆC\in \mathbb{C}

    xฮผ(n)โ‰”[cosโก(x),sinโก(x)] x_{\mu}(n) โ‰” \left[\cos(x), \sin(x)\right]

    with shape(xฮผ)\left(x_{\mu}\right)= [Nb, 2, Nt, Nx, 2] โˆˆR\in \mathbb{R}

  • xx-Network:

    • ฯˆฮธ:(x,v)โŸถ(sx,โ€‰tx,โ€‰qx)\psi_{\theta}: (x, v) \longrightarrow \left(s_{x},\, t_{x},\, q_{x}\right)
  • vv-Network:

    • ฯ†ฮธ:(x,v)โŸถ(sv,โ€‰tv,โ€‰qv)\varphi_{\theta}: (x, v) \longrightarrow \left(s_{v},\, t_{v},\, q_{v}\right) โŸต\hspace{2pt}\longleftarrow lets look at this

vv-Update

  • forward (d=+)(d = \textcolor{#FF5252}{+}):

ฮ“+:(x,v)โ†’vโ€ฒ:=vโ‹…eฮต2svโˆ’ฮต2[Fโ‹…eฮตqv+tv]\Gamma^{\textcolor{#FF5252}{+}}: (x, v) \rightarrow v' := v \cdot e^{\frac{\varepsilon}{2} s_{v}} - \frac{\varepsilon}{2}\left[ F \cdot e^{\varepsilon q_{v}} + t_{v} \right]

  • backward (d=โˆ’)(d = \textcolor{#1A8FFF}{-}):

ฮ“โˆ’:(x,v)โ†’vโ€ฒ:=eโˆ’ฮต2sv{v+ฮต2[Fโ‹…eฮตqv+tv]}\Gamma^{\textcolor{#1A8FFF}{-}}: (x, v) \rightarrow v' := e^{-\frac{\varepsilon}{2} s_{v}} \left\{v + \frac{\varepsilon}{2}\left[ F \cdot e^{\varepsilon q_{v}} + t_{v} \right]\right\}

xx-Update

  • forward (d=+)(d = \textcolor{#FF5252}{+}):

ฮ›+(x,v)=xโ‹…eฮต2sxโˆ’ฮต2[vโ‹…eฮตqx+tx]\Lambda^{\textcolor{#FF5252}{+}}(x, v) = x \cdot e^{\frac{\varepsilon}{2} s_{x}} - \frac{\varepsilon}{2}\left[ v \cdot e^{\varepsilon q_{x}} + t_{x} \right]

  • backward (d=โˆ’)(d = \textcolor{#1A8FFF}{-}):

ฮ›โˆ’(x,v)=eโˆ’ฮต2sx{x+ฮต2[vโ‹…eฮตqx+tx]}\Lambda^{\textcolor{#1A8FFF}{-}}(x, v) = e^{-\frac{\varepsilon}{2} s_{x}} \left\{x + \frac{\varepsilon}{2}\left[ v \cdot e^{\varepsilon q_{x}} + t_{x} \right]\right\}

Lattice Gauge Theory (2D U(1)U(1))

Note๐Ÿ”— Link Variables

Uฮผ(n)=eixฮผ(n)โˆˆC,whereU_{\mu}(n) = e^{i x_{\mu}(n)}\in \mathbb{C},\quad \text{where}\quad xฮผ(n)โˆˆ[โˆ’ฯ€,ฯ€)x_{\mu}(n) \in [-\pi,\pi)

Important๐Ÿซธ Wilson Action

Sฮฒ(x)=ฮฒโˆ‘PcosโกxP,S_{\beta}(x) = \beta\sum_{P} \cos \textcolor{#00CCFF}{x_{P}},

xP=[xฮผ(n)+xฮฝ(n+ฮผ^)โˆ’xฮผ(n+ฮฝ^)โˆ’xฮฝ(n)]\textcolor{#00CCFF}{x_{P}} = \left[x_{\mu}(n) + x_{\nu}(n+\hat{\mu}) - x_{\mu}(n+\hat{\nu})-x_{\nu}(n)\right]

Note: xP\textcolor{#00CCFF}{x_{P}} is the product of links around 1ร—11\times 1 square, called a โ€œplaquetteโ€

2D Lattice

2D Lattice

Figure 18: Jupyter Notebook

Annealing Schedule

  • Introduce an annealing schedule during the training phase:

    {ฮณt}t=0N={ฮณ0,ฮณ1,โ€ฆ,ฮณNโˆ’1,ฮณN}\left\{ \gamma_{t} \right\}_{t=0}^{N} = \left\{\gamma_{0}, \gamma_{1}, \ldots, \gamma_{N-1}, \gamma_{N} \right\}

    where ฮณ0<ฮณ1<โ‹ฏ<ฮณNโ‰ก1\gamma_{0} < \gamma_{1} < \cdots < \gamma_{N} \equiv 1, and โˆฃฮณt+1โˆ’ฮณtโˆฃโ‰ช1\left|\gamma_{t+1} - \gamma_{t}\right| \ll 1

  • Note:

    • for โˆฃฮณtโˆฃ<1\left|\gamma_{t}\right| < 1, this rescaling helps to reduce the height of the energy barriers โŸน\Longrightarrow
    • easier for our sampler to explore previously inaccessible regions of the phase space

Networks 2D U(1)U(1)

  • Stack gauge links as shape(Uฮผ)\left(U_{\mu}\right)=[Nb, 2, Nt, Nx] โˆˆC\in \mathbb{C}

    xฮผ(n)โ‰”[cosโก(x),sinโก(x)] x_{\mu}(n) โ‰” \left[\cos(x), \sin(x)\right]

    with shape(xฮผ)\left(x_{\mu}\right)= [Nb, 2, Nt, Nx, 2] โˆˆR\in \mathbb{R}

  • xx-Network:

    • ฯˆฮธ:(x,v)โŸถ(sx,โ€‰tx,โ€‰qx)\psi_{\theta}: (x, v) \longrightarrow \left(s_{x},\, t_{x},\, q_{x}\right)
  • vv-Network:

    • ฯ†ฮธ:(x,v)โŸถ(sv,โ€‰tv,โ€‰qv)\varphi_{\theta}: (x, v) \longrightarrow \left(s_{v},\, t_{v},\, q_{v}\right)

Toy Example: GMM โˆˆR2\in \mathbb{R}^{2}

Figure 19

Physical Quantities

  • To estimate physical quantities, we:
    • Calculate physical observables at increasing spatial resolution
    • Perform extrapolation to continuum limit
Figure 20: Increasing the physical resolution (aโ†’0a \rightarrow 0) allows us to make predictions about numerical values of physical quantities in the continuum limit.

Extra

Footnotes

  1. Here, โˆผ\sim means โ€œis distributed according toโ€โ†ฉ๏ธŽ

  2. Here, โˆผ\sim means โ€œis distributed according toโ€โ†ฉ๏ธŽ

  3. We always start by resampling the momentum, v0โˆผN(0,1)v_{0} \sim \mathcal{N}(0, \mathbb{1})โ†ฉ๏ธŽ

  4. L2HMC: (, )โ†ฉ๏ธŽ

  5. For simple xโˆˆR2\mathbf{x} \in \mathbb{R}^{2} example, Lฮธ=A(ฮพโˆ—โˆฃฮพ)โ‹…(xโˆ—โˆ’x)2\mathcal{L}_{\theta} = A(\xi^{\ast}|\xi)\cdot \left(\mathbf{x}^{\ast} - \mathbf{x}\right)^{2}โ†ฉ๏ธŽ

  6. ฯƒ(โ‹…)\sigma(\cdot) denotes an activation functionโ†ฉ๏ธŽ

  7. ฮปs,ฮปqโˆˆR\lambda_{s}, \lambda_{q} \in \mathbb{R}, trainable parametersโ†ฉ๏ธŽ

  8. Note that (ฮ“+)โˆ’1=ฮ“โˆ’\left(\Gamma^{+}\right)^{-1} = \Gamma^{-}, i.e. ฮ“+[ฮ“โˆ’(U,P)]=ฮ“โˆ’[ฮ“+(U,P)]=(U,P)\Gamma^{+}\left[\Gamma^{-}(U, P)\right] = \Gamma^{-}\left[\Gamma^{+}(U, P)\right] = (U, P)โ†ฉ๏ธŽ

  9. Where ฮพโˆ—\xi^{\ast} is the proposed configuration (prior to Accept / Reject)โ†ฉ๏ธŽ

  10. And โˆฃโˆ‚ฮพโˆ—โˆ‚ฮพTโˆฃ\left|\frac{\partial \xi^{\ast}}{\partial \xi^{T}}\right| is the Jacobian of the transformation from ฮพโ†’ฮพโˆ—\xi \rightarrow \xi^{\ast}โ†ฉ๏ธŽ

  11. Note that (ฮ“+)โˆ’1=ฮ“โˆ’\left(\Gamma^{+}\right)^{-1} = \Gamma^{-}, i.e. ฮ“+[ฮ“โˆ’(x,v)]=ฮ“โˆ’[ฮ“+(x,v)]=(x,v)\Gamma^{+}\left[\Gamma^{-}(x, v)\right] = \Gamma^{-}\left[\Gamma^{+}(x, v)\right] = (x, v)โ†ฉ๏ธŽ

Citation

For attribution, please cite this work as:
Foreman, Sam. 2023. โ€œMLMC: Machine Learning Monte Carlo for Lattice Gauge Theory.โ€ Https://Indico.fnal.gov/Event/57249/Contributions/271305/. Presentation at the 2023 International Symposium on Lattice Field Theory, July 31. https://saforem2.github.io/lattice23.