MLMC: Machine Learning Monte Carlo

Sam Foreman
[email protected]

ALCF

2023-07-31

MLMC: Machine Learning Monte Carlo
for Lattice Gauge Theory

 

Sam Foreman
Xiao-Yong Jin, James C. Osborn
saforem2/{lattice23, l2hmc-qcd}

2023-07-31 @ Lattice 2023

Overview

  1. Background: {MCMC,HMC}
    • Leapfrog Integrator
    • Issues with HMC
    • Can we do better?
  2. L2HMC: Generalizing MD
    • 4D SU(3)SU(3)SU(3) Model
    • Results
  3. References
  4. Extras

Markov Chain Monte Carlo (MCMC)

🎯 Goal

Generate independent samples {xi}\{x_{i}\}{xi​}, such that1 {xi}∼p(x)∝e−S(x)\{x_{i}\} \sim p(x) \propto e^{-S(x)}{xi​}∼p(x)∝e−S(x) where S(x)S(x)S(x) is the action (or potential energy)

  • Want to calculate observables O\mathcal{O}O:
    ⟨O⟩∝∫[Dx]O(x) p(x)\left\langle \mathcal{O}\right\rangle \propto \int \left[\mathcal{D}x\right]\hspace{4pt} {\mathcal{O}(x)\, p(x)}⟨O⟩∝∫[Dx]O(x)p(x)

If these were independent, we could approximate: ⟨O⟩≃1N∑n=1NO(xn)\left\langle\mathcal{O}\right\rangle \simeq \frac{1}{N}\sum^{N}_{n=1}\mathcal{O}(x_{n})⟨O⟩≃N1​∑n=1N​O(xn​)
σO2=1NVar[O(x)]⟹σO∝1N\sigma_{\mathcal{O}}^{2} = \frac{1}{N}\mathrm{Var}{\left[\mathcal{O} (x) \right]}\Longrightarrow \sigma_{\mathcal{O}} \propto \frac{1}{\sqrt{N}}σO2​=N1​Var[O(x)]⟹σO​∝N​1​

saforem2/lattice23

    Markov Chain Monte Carlo (MCMC)

    🎯 Goal

    Generate independent samples {xi}\{x_{i}\}{xi​}, such that1 {xi}∼p(x)∝e−S(x)\{x_{i}\} \sim p(x) \propto e^{-S(x)}{xi​}∼p(x)∝e−S(x) where S(x)S(x)S(x) is the action (or potential energy)

    • Want to calculate observables O\mathcal{O}O:
      ⟨O⟩∝∫[Dx]O(x) p(x)\left\langle \mathcal{O}\right\rangle \propto \int \left[\mathcal{D}x\right]\hspace{4pt} {\mathcal{O}(x)\, p(x)}⟨O⟩∝∫[Dx]O(x)p(x)

    Instead, nearby configs are correlated, and we incur a factor of τintO\textcolor{#FF5252}{\tau^{\mathcal{O}}_{\mathrm{int}}}τintO​: σO2=τintONVar[O(x)]\sigma_{\mathcal{O}}^{2} = \frac{\textcolor{#FF5252}{\tau^{\mathcal{O}}_{\mathrm{int}}}}{N}\mathrm{Var}{\left[\mathcal{O} (x) \right]}σO2​=NτintO​​Var[O(x)]

    saforem2/lattice23

      Hamiltonian Monte Carlo (HMC)

      • Want to (sequentially) construct a chain of states: x0→x1→xi→⋯→xNx_{0} \rightarrow x_{1} \rightarrow x_{i} \rightarrow \cdots \rightarrow x_{N}\hspace{10pt}x0​→x1​→xi​→⋯→xN​

        such that, as N→∞N \rightarrow \inftyN→∞: {xi,xi+1,xi+2,⋯ ,xN}→N→∞p(x)∝e−S(x)\left\{x_{i}, x_{i+1}, x_{i+2}, \cdots, x_{N} \right\} \xrightarrow[]{N\rightarrow\infty} p(x) \propto e^{-S(x)}{xi​,xi+1​,xi+2​,⋯,xN​}N→∞​p(x)∝e−S(x)

      🪄 Trick

      • Introduce fictitious momentum v∼N(0,1)v \sim \mathcal{N}(0, \mathbb{1})v∼N(0,1)
        • Normally distributed independent of xxx, i.e. p(x,v)=p(x) p(v)∝e−S(x)e−12vTv=e−[S(x)+12vTv]=e−H(x,v)\begin{align*} p(x, v) &\textcolor{#02b875}{=} p(x)\,p(v) \propto e^{-S{(x)}} e^{-\frac{1}{2} v^{T}v} = e^{-\left[S(x) + \frac{1}{2} v^{T}{v}\right]} \textcolor{#02b875}{=} e^{-H(x, v)} \end{align*}p(x,v)​=p(x)p(v)∝e−S(x)e−21​vTv=e−[S(x)+21​vTv]=e−H(x,v)​

      Hamiltonian Monte Carlo (HMC)

      • Idea: Evolve the (x˙,v˙)(\dot{x}, \dot{v})(x˙,v˙) system to get new states {xi}\{x_{i}\}{xi​}❗

      • Write the joint distribution p(x,v)p(x, v)p(x,v): p(x,v)∝e−S[x]e−12vTv=e−H(x,v) p(x, v) \propto e^{-S[x]} e^{-\frac{1}{2}v^{T} v} = e^{-H(x, v)} p(x,v)∝e−S[x]e−21​vTv=e−H(x,v)

      🔋 Hamiltonian Dynamics

      H=S[x]+12vTv⟹H = S[x] + \frac{1}{2} v^{T} v \LongrightarrowH=S[x]+21​vTv⟹ x˙=+∂vH,  v˙=−∂xH\dot{x} = +\partial_{v} H, \,\,\dot{v} = -\partial_{x} Hx˙=+∂v​H,v˙=−∂x​H

      Figure 1: Overview of HMC algorithm

      Leapfrog Integrator (HMC)

      🔋 Hamiltonian Dynamics

      (x˙,v˙)=(∂vH,−∂xH)\left(\dot{x}, \dot{v}\right) = \left(\partial_{v} H, -\partial_{x} H\right)(x˙,v˙)=(∂v​H,−∂x​H)

      🐸 Leapfrog Step

      input  (x,v)→(x′,v′) \,\left(x, v\right) \rightarrow \left(x', v'\right)\,(x,v)→(x′,v′) output

      v~:=Γ(x,v)=v−ε2∂xS(x)x′:=Λ(x,v~) =x+ε v~v′:=Γ(x′,v~)=v~−ε2∂xS(x′)\begin{align*} \tilde{v} &:= \textcolor{#F06292}{\Gamma}(x, v)\hspace{2.2pt} = v - \frac{\varepsilon}{2} \partial_{x} S(x) \\ x' &:= \textcolor{#FD971F}{\Lambda}(x, \tilde{v}) \, = x + \varepsilon \, \tilde{v} \\ v' &:= \textcolor{#F06292}{\Gamma}(x', \tilde{v}) = \tilde{v} - \frac{\varepsilon}{2} \partial_{x} S(x') \end{align*}v~x′v′​:=Γ(x,v)=v−2ε​∂x​S(x):=Λ(x,v~)=x+εv~:=Γ(x′,v~)=v~−2ε​∂x​S(x′)​

      ⚠️ Warning!

      • Resample v0∼N(0,1)v_{0} \sim \mathcal{N}(0, \mathbb{1})v0​∼N(0,1)
        at the beginning of each trajectory

      Note: ∂xS(x)\partial_{x} S(x)∂x​S(x) is the force

      HMC Update

      • We build a trajectory of NLFN_{\mathrm{LF}}NLF​ leapfrog steps1 (x0,v0)→(x1,v1)→⋯→(x′,v′)\begin{equation*} (x_{0}, v_{0})% \rightarrow (x_{1}, v_{1})\rightarrow \cdots% \rightarrow (x', v') \end{equation*}(x0​,v0​)→(x1​,v1​)→⋯→(x′,v′)​

      • And propose x′x'x′ as the next state in our chain

      Γ:(x,v)→v′:=v−ε2∂xS(x)Λ:(x,v)→x′:=x+εv\begin{align*} \textcolor{#F06292}{\Gamma}: (x, v) \textcolor{#F06292}{\rightarrow} v' &:= v - \frac{\varepsilon}{2} \partial_{x} S(x) \\ \textcolor{#FD971F}{\Lambda}: (x, v) \textcolor{#FD971F}{\rightarrow} x' &:= x + \varepsilon v \end{align*}Γ:(x,v)→v′Λ:(x,v)→x′​:=v−2ε​∂x​S(x):=x+εv​

      • We then accept / reject x′x'x′ using Metropolis-Hastings criteria,
        A(x′∣x)=min⁡{1,p(x′)p(x)∣∂x′∂x∣}A(x'|x) = \min\left\{1, \frac{p(x')}{p(x)}\left|\frac{\partial x'}{\partial x}\right|\right\}A(x′∣x)=min{1,p(x)p(x′)​​∂x∂x′​​}


      1. We always start by resampling the momentum, v0∼N(0,1)v_{0} \sim \mathcal{N}(0, \mathbb{1})v0​∼N(0,1)↩︎

        HMC Demo

        Figure 2: HMC Demo

        Issues with HMC

        • What do we want in a good sampler?
          • Fast mixing (small autocorrelations)
          • Fast burn-in (quick convergence)
        • Problems with HMC:
          • Energy levels selected randomly →\rightarrow→ slow mixing
          • Cannot easily traverse low-density zones →\rightarrow→ slow convergence

        HMC Samples with ε=0.25\varepsilon=0.25ε=0.25

        HMC Samples with ε=0.5\varepsilon=0.5ε=0.5
        Figure 3: HMC Samples generated with varying step sizes ε\varepsilonε

        Topological Freezing

        Topological Charge: Q=12π∑P⌊xP⌋∈ZQ = \frac{1}{2\pi}\sum_{P}\left\lfloor x_{P}\right\rfloor \in \mathbb{Z}Q=2π1​P∑​⌊xP​⌋∈Z

        note: ⌊xP⌋=xP−2π⌊xP+π2π⌋\left\lfloor x_{P} \right\rfloor = x_{P} - 2\pi \left\lfloor\frac{x_{P} + \pi}{2\pi}\right\rfloor⌊xP​⌋=xP​−2π⌊2πxP​+π​⌋

        ⏳ Critical Slowing Down

        • QQQ gets stuck!
          • as β⟶∞\beta\longrightarrow \inftyβ⟶∞:
            • Q⟶const.Q \longrightarrow \text{const.}Q⟶const.
            • δQ=(Q∗−Q)→0⟹\delta Q = \left(Q^{\ast} - Q\right) \rightarrow 0 \textcolor{#FF5252}{\Longrightarrow}δQ=(Q∗−Q)→0⟹
          • # configs required to estimate errors
            grows exponentially: τintQ⟶∞\tau_{\mathrm{int}}^{Q} \longrightarrow \inftyτintQ​⟶∞

        Note \delta Q \rightarrow 0 at increasing \beta

        Note δQ→0\delta Q \rightarrow 0δQ→0 at increasing β\betaβ

        Can we do better?

        • Introduce two (invertible NNs) vNet and xNet1:
          • vNet: (x,F)⟶(sv, tv, qv)(x, F) \longrightarrow \left(s_{v},\, t_{v},\, q_{v}\right)(x,F)⟶(sv​,tv​,qv​)
          • xNet: (x,v)⟶(sx, tx, qx)(x, v) \longrightarrow \left(s_{x},\, t_{x},\, q_{x}\right)(x,v)⟶(sx​,tx​,qx​)

         

        • Use these (s,t,q)(s, t, q)(s,t,q) in the generalized MD update:
          • Γθ±\Gamma_{\theta}^{\pm}Γθ±​ :(x,v)→sv,tv,qv(x,v′): ({x}, \textcolor{#07B875}{v}) \xrightarrow[]{\textcolor{#F06292}{s_{v}, t_{v}, q_{v}}} (x, \textcolor{#07B875}{v'}):(x,v)sv​,tv​,qv​​(x,v′)
          • Λθ±\Lambda_{\theta}^{\pm}Λθ±​ :(x,v)→sx,tx,qx(x′,v): (\textcolor{#AE81FF}{x}, v) \xrightarrow[]{\textcolor{#FD971F}{s_{x}, t_{x}, q_{x}}} (\textcolor{#AE81FF}{x'}, v):(x,v)sx​,tx​,qx​​(x′,v)
        Figure 4: Generalized MD update where Λθ±\Lambda_{\theta}^{\pm}Λθ±​, Γθ±\Gamma_{\theta}^{\pm}Γθ±​ are invertible NNs

          L2HMC: Generalizing the MD Update

          • Introduce d∼U(±)d \sim \mathcal{U}(\pm)d∼U(±) to determine the direction of our update

            1. v′=\textcolor{#07B875}{v'} =v′= Γ±\Gamma^{\pm}Γ±(x,v)({x}, \textcolor{#07B875}{v})(x,v) \hspace{46pt} update vvv

            2. x′=\textcolor{#AE81FF}{x'} =x′= xBx_{B}xB​ + \,+\,+Λ±\Lambda^{\pm}Λ±(((xAx_{A}xA​,v′), {v'}),v′) \hspace{10pt} update first half: xAx_{A}xA​

            3. x′′=\textcolor{#AE81FF}{x''} =x′′= xA′x'_{A}xA′​ + \,+\,+Λ±\Lambda^{\pm}Λ±(((xB′x'_{B}xB′​,v′), {v'}),v′) \hspace{8pt} update other half: xBx_{B}xB​

            4. v′′=\textcolor{#07B875}{v''} =v′′= Γ±\Gamma^{\pm}Γ±(x′′,v′)({x''}, \textcolor{#07B875}{v'})(x′′,v′) \hspace{36pt} update vvv

          • Resample both v∼N(0,1)v\sim \mathcal{N}(0, 1)v∼N(0,1), and d∼U(±)d \sim \mathcal{U}(\pm)d∼U(±) at the beginning of each trajectory
            • To ensure ergodicity + reversibility, we split the xxx update into sequential (complementary) updates
          • Introduce directional variable d∼U(±)d \sim \mathcal{U}(\pm)d∼U(±), resampled at the beginning of each trajectory:
            • Note that (Γ+)−1=Γ−\left(\Gamma^{+}\right)^{-1} = \Gamma^{-}(Γ+)−1=Γ−, i.e. Γ+[Γ−(x,v)]=Γ−[Γ+(x,v)]=(x,v)\Gamma^{+}\left[\Gamma^{-}(x, v)\right] = \Gamma^{-}\left[\Gamma^{+}(x, v)\right] = (x, v)Γ+[Γ−(x,v)]=Γ−[Γ+(x,v)]=(x,v)
          Figure 5: Generalized MD update with Λθ±\Lambda_{\theta}^{\pm}Λθ±​, Γθ±\Gamma_{\theta}^{\pm}Γθ±​ invertible NNs

          L2HMC: Leapfrog Layer

          L2HMC Update

          👨‍💻 Algorithm

          1. input: xxx

            • Resample: v∼N(0,1)\textcolor{#07B875}{v} \sim \mathcal{N}(0, \mathbb{1})v∼N(0,1);   d∼U(±)\,\,{d\sim\mathcal{U}(\pm)}d∼U(±)
            • Construct initial state: ξ=(x,v,±)\textcolor{#939393}{\xi} =(\textcolor{#AE81FF}{x}, \textcolor{#07B875}{v}, {\pm})ξ=(x,v,±)
          2. forward: Generate proposal ξ′\xi'ξ′ by passing initial ξ\xiξ through NLFN_{\mathrm{LF}}NLF​ leapfrog layers
            ξ→LF layerξ1⟶⋯⟶ξNLF=ξ′:=(x′′,v′′)\textcolor{#939393} \xi \hspace{1pt}\xrightarrow[]{\tiny{\mathrm{LF} \text{ layer}}}\xi_{1} \longrightarrow\cdots \longrightarrow \xi_{N_{\mathrm{LF}}} = \textcolor{#f8f8f8}{\xi'} := (\textcolor{#AE81FF}{x''}, \textcolor{#07B875}{v''})ξLF layer​ξ1​⟶⋯⟶ξNLF​​=ξ′:=(x′′,v′′)

            • Accept / Reject: A(ξ′∣ξ)=min{1,π(ξ′)π(ξ)∣J(ξ′,ξ)∣}\begin{equation*} A({\textcolor{#f8f8f8}{\xi'}}|{\textcolor{#939393}{\xi}})= \mathrm{min}\left\{1, \frac{\pi(\textcolor{#f8f8f8}{\xi'})}{\pi(\textcolor{#939393}{\xi})} \left| \mathcal{J}\left(\textcolor{#f8f8f8}{\xi'},\textcolor{#939393}{\xi}\right)\right| \right\} \end{equation*}A(ξ′∣ξ)=min{1,π(ξ)π(ξ′)​∣J(ξ′,ξ)∣}​
          3. backward (if training):

            • Evaluate the loss function1 L←Lθ(ξ′,ξ)\mathcal{L}\gets \mathcal{L}_{\theta}(\textcolor{#f8f8f8}{\xi'}, \textcolor{#939393}{\xi})L←Lθ​(ξ′,ξ) and backprop
          4. return: xi+1\textcolor{#AE81FF}{x}_{i+1}xi+1​
            Evaluate MH criteria (1)(1)(1) and return accepted config, xi+1←{x′′ w/ prob A(ξ′′∣ξ)✅x w/ prob 1−A(ξ′′∣ξ)🚫\textcolor{#AE81FF}{{x}_{i+1}}\gets \begin{cases} \textcolor{#f8f8f8}{\textcolor{#AE81FF}{x''}} \small{\text{ w/ prob }} A(\textcolor{#f8f8f8}{\xi''}|\textcolor{#939393}{\xi}) \hspace{26pt} ✅ \\ \textcolor{#939393}{\textcolor{#AE81FF}{x}} \hspace{5pt}\small{\text{ w/ prob }} 1 - A(\textcolor{#f8f8f8}{\xi''}|{\textcolor{#939393}{\xi}}) \hspace{10pt} 🚫 \end{cases}xi+1​←{x′′ w/ prob A(ξ′′∣ξ)✅x w/ prob 1−A(ξ′′∣ξ)🚫​

          Figure 6: Leapfrog Layer used in generalized MD update

          1. For simple x∈R2\mathbf{x} \in \mathbb{R}^{2}x∈R2 example, Lθ=A(ξ∗∣ξ)⋅(x∗−x)2\mathcal{L}_{\theta} = A(\xi^{\ast}|\xi)\cdot \left(\mathbf{x}^{\ast} - \mathbf{x}\right)^{2}Lθ​=A(ξ∗∣ξ)⋅(x∗−x)2↩︎

            4D SU(3)SU(3)SU(3) Model

            🔗 Link Variables

            • Write link variables Uμ(x)∈SU(3)U_{\mu}(x) \in SU(3)Uμ​(x)∈SU(3):

              Uμ(x)=exp[i ωμk(x)λk]=eiQ,withQ∈su(3) \begin{align*} U_{\mu}(x) &= \mathrm{exp}\left[{i\, \textcolor{#AE81FF}{\omega^{k}_{\mu}(x)} \lambda^{k}}\right]\\ &= e^{i \textcolor{#AE81FF}{Q}},\quad \text{with} \quad \textcolor{#AE81FF}{Q} \in \mathfrak{su}(3) \end{align*}Uμ​(x)​=exp[iωμk​(x)λk]=eiQ,withQ∈su(3)​

              where ωμk(x)\omega^{k}_{\mu}(x)ωμk​(x) ∈R\in \mathbb{R}∈R, and λk\lambda^{k}λk are the generators of SU(3)SU(3)SU(3)

            🏃‍♂️‍➡️ Conjugate Momenta

            • Introduce Pμ(x)=Pμk(x)λkP_{\mu}(x) = P^{k}_{\mu}(x) \lambda^{k}Pμ​(x)=Pμk​(x)λk conjugate to ωμk(x)\omega^{k}_{\mu}(x)ωμk​(x)

            🟥 Wilson Action

            SG=−β6∑Tr[Uμν(x)+Uμν†(x)] S_{G} = -\frac{\beta}{6} \sum \mathrm{Tr}\left[U_{\mu\nu}(x) + U^{\dagger}_{\mu\nu}(x)\right] SG​=−6β​∑Tr[Uμν​(x)+Uμν†​(x)]

            where Uμν(x)=Uμ(x)Uν(x+μ^)Uμ†(x+ν^)Uν†(x)U_{\mu\nu}(x) = U_{\mu}(x) U_{\nu}(x+\hat{\mu}) U^{\dagger}_{\mu}(x+\hat{\nu}) U^{\dagger}_{\nu}(x)Uμν​(x)=Uμ​(x)Uν​(x+μ^​)Uμ†​(x+ν^)Uν†​(x)

            Figure 7: Illustration of the lattice

            HMC: 4D SU(3)SU(3)SU(3)

            Hamiltonian: H[P,U]=12P2+S[U]⟹H[P, U] = \frac{1}{2} P^{2} + S[U] \LongrightarrowH[P,U]=21​P2+S[U]⟹

            • UUU update: dωkdt=∂H∂Pk\frac{d\omega^{k}}{dt} = \frac{\partial H}{\partial P^{k}}dtdωk​=∂Pk∂H​ dωkdtλk=Pkλk⟹dQdt=P\frac{d\omega^{k}}{dt}\lambda^{k} = P^{k}\lambda^{k} \Longrightarrow \frac{dQ}{dt} = Pdtdωk​λk=Pkλk⟹dtdQ​=P Q(ε)=Q(0)+εP(0)⟹−i log⁡U(ε)=−i log⁡U(0)+εP(0)U(ε)=ei εP(0)U(0)⟹Λ:  U⟶U′:=eiεP′U\begin{align*} Q(\textcolor{#FFEE58}{\varepsilon}) &= Q(0) + \textcolor{#FFEE58}{\varepsilon} P(0)\Longrightarrow\\ -i\, \log U(\textcolor{#FFEE58}{\varepsilon}) &= -i\, \log U(0) + \textcolor{#FFEE58}{\varepsilon} P(0) \\ U(\textcolor{#FFEE58}{\varepsilon}) &= e^{i\,\textcolor{#FFEE58}{\varepsilon} P(0)} U(0)\Longrightarrow \\ &\hspace{1pt}\\ \textcolor{#FD971F}{\Lambda}:\,\, U \longrightarrow U' &:= e^{i\varepsilon P'} U \end{align*}Q(ε)−ilogU(ε)U(ε)Λ:U⟶U′​=Q(0)+εP(0)⟹=−ilogU(0)+εP(0)=eiεP(0)U(0)⟹:=eiεP′U​
            • PPP update: dPkdt=−∂H∂ωk\frac{dP^{k}}{dt} = - \frac{\partial H}{\partial \omega^{k}}dtdPk​=−∂ωk∂H​ dPkdt=−∂H∂ωk=−∂H∂Q=−dSdQ⟹\frac{dP^{k}}{dt} = - \frac{\partial H}{\partial \omega^{k}} = -\frac{\partial H}{\partial Q} = -\frac{dS}{dQ}\LongrightarrowdtdPk​=−∂ωk∂H​=−∂Q∂H​=−dQdS​⟹ P(ε)=P(0)−εdSdQ∣t=0=P(0)−ε F[U]Γ:  P⟶P′:=P−ε2F[U]\begin{align*} P(\textcolor{#FFEE58}{\varepsilon}) &= P(0) - \textcolor{#FFEE58}{\varepsilon} \left.\frac{dS}{dQ}\right|_{t=0} \\ &= P(0) - \textcolor{#FFEE58}{\varepsilon} \,\textcolor{#E599F7}{F[U]} \\ &\hspace{1pt}\\ \textcolor{#F06292}{\Gamma}:\,\, P \longrightarrow P' &:= P - \frac{\varepsilon}{2} F[U] \end{align*}P(ε)Γ:P⟶P′​=P(0)−εdQdS​​t=0​=P(0)−εF[U]:=P−2ε​F[U]​

            ε\textcolor{#FFEE58}{\varepsilon}ε is the step size

            F[U]\textcolor{#E599F7}{F[U]}F[U] is the force term

            HMC: 4D SU(3)SU(3)SU(3)

            • Momentum Update: Γ:P⟶P′:=P−ε2F[U]\textcolor{#F06292}{\Gamma}: P \longrightarrow P' := P - \frac{\varepsilon}{2} F[U]Γ:P⟶P′:=P−2ε​F[U]

            • Link Update: Λ:U⟶U′:=eiεP′U\textcolor{#FD971F}{\Lambda}: U \longrightarrow U' := e^{i\varepsilon P'} U\quad\quadΛ:U⟶U′:=eiεP′U

            • We maintain a batch of Nb lattices, all updated in parallel

              • UUU.dtype = complex128
              • UUU.shape
                = [Nb, 4, Nt, Nx, Ny, Nz, 3, 3]

            Networks 4D SU(3)SU(3)SU(3)

             

             

            UUU-Network:

            UNet: (U,P)⟶(sU, tU, qU)(U, P) \longrightarrow \left(s_{U},\, t_{U},\, q_{U}\right)(U,P)⟶(sU​,tU​,qU​)

             

            PPP-Network:

            PNet: (U,P)⟶(sP, tP, qP)(U, P) \longrightarrow \left(s_{P},\, t_{P},\, q_{P}\right)(U,P)⟶(sP​,tP​,qP​)

            Networks 4D SU(3)SU(3)SU(3)

             

             

            UUU-Network:

            UNet: (U,P)⟶(sU, tU, qU)(U, P) \longrightarrow \left(s_{U},\, t_{U},\, q_{U}\right)(U,P)⟶(sU​,tU​,qU​)

             

            PPP-Network:

            PNet: (U,P)⟶(sP, tP, qP)(U, P) \longrightarrow \left(s_{P},\, t_{P},\, q_{P}\right)(U,P)⟶(sP​,tP​,qP​)

            ↑\uparrow↑
            let’s look at this

            PPP-Network (pt. 1)

            • input1: (U,F):=(eiQ,F)\hspace{7pt}\left(U, F\right) := (e^{i Q}, F)(U,F):=(eiQ,F) h0=σ(wQQ+wFF+b)h1=σ(w1h0+b1)⋮hn=σ(wn−1hn−2+bn)z:=σ(wnhn−1+bn)⟶\begin{align*} h_{0} &= \sigma\left( w_{Q} Q + w_{F} F + b \right) \\ h_{1} &= \sigma\left( w_{1} h_{0} + b_{1} \right) \\ &\vdots \\ h_{n} &= \sigma\left(w_{n-1} h_{n-2} + b_{n}\right) \\ \textcolor{#FF5252}{z} & := \sigma\left(w_{n} h_{n-1} + b_{n}\right) \longrightarrow \\ \end{align*}h0​h1​hn​z​=σ(wQ​Q+wF​F+b)=σ(w1​h0​+b1​)⋮=σ(wn−1​hn−2​+bn​):=σ(wn​hn−1​+bn​)⟶​
            • output2: (sP,tP,qP)\hspace{7pt} (s_{P}, t_{P}, q_{P})(sP​,tP​,qP​)

              • sP=λstanh⁡(wsz+bs)s_{P} = \lambda_{s} \tanh(w_s \textcolor{#FF5252}{z} + b_s)sP​=λs​tanh(ws​z+bs​)
              • tP=wtz+btt_{P} = w_{t} \textcolor{#FF5252}{z} + b_{t}tP​=wt​z+bt​
              • qP=λqtanh⁡(wqz+bq)q_{P} = \lambda_{q} \tanh(w_{q} \textcolor{#FF5252}{z} + b_{q})qP​=λq​tanh(wq​z+bq​)

            1. σ(⋅)\sigma(\cdot)σ(⋅) denotes an activation function↩︎

            2. λs,λq∈R\lambda_{s}, \lambda_{q} \in \mathbb{R}λs​,λq​∈R, trainable parameters↩︎

              PPP-Network (pt. 2)

              • Use (sP,tP,qP)(s_{P}, t_{P}, q_{P})(sP​,tP​,qP​) to update Γ±:(U,P)→(U,P±)\Gamma^{\pm}: (U, P) \rightarrow \left(U, P_{\pm}\right)Γ±:(U,P)→(U,P±​)1:

                • forward (d=+)(d = \textcolor{#FF5252}{+})(d=+): Γ+(U,P):=P+=P⋅eε2sP−ε2[F⋅eεqP+tP]\Gamma^{\textcolor{#FF5252}{+}}(U, P) := P_{\textcolor{#FF5252}{+}} = P \cdot e^{\frac{\varepsilon}{2} s_{P}} - \frac{\varepsilon}{2}\left[ F \cdot e^{\varepsilon q_{P}} + t_{P} \right]Γ+(U,P):=P+​=P⋅e2ε​sP​−2ε​[F⋅eεqP​+tP​]

                • backward (d=−)(d = \textcolor{#1A8FFF}{-})(d=−): Γ−(U,P):=P−=e−ε2sP{P+ε2[F⋅eεqP+tP]}\Gamma^{\textcolor{#1A8FFF}{-}}(U, P) := P_{\textcolor{#1A8FFF}{-}} = e^{-\frac{\varepsilon}{2} s_{P}} \left\{P + \frac{\varepsilon}{2}\left[ F \cdot e^{\varepsilon q_{P}} + t_{P} \right]\right\}Γ−(U,P):=P−​=e−2ε​sP​{P+2ε​[F⋅eεqP​+tP​]}


              1. Note that (Γ+)−1=Γ−\left(\Gamma^{+}\right)^{-1} = \Gamma^{-}(Γ+)−1=Γ−, i.e. Γ+[Γ−(U,P)]=Γ−[Γ+(U,P)]=(U,P)\Gamma^{+}\left[\Gamma^{-}(U, P)\right] = \Gamma^{-}\left[\Gamma^{+}(U, P)\right] = (U, P)Γ+[Γ−(U,P)]=Γ−[Γ+(U,P)]=(U,P)↩︎

                Results: 2D U(1)U(1)U(1)

                📈 Improvement

                We can measure the performance by comparing τint\tau_{\mathrm{int}}τint​ for the trained model vs. HMC.

                Note: lower is better

                Interpretation

                Deviation in xPx_{P}xP​

                Topological charge mixing

                Artificial influx of energy

                Figure 8: Illustration of how different observables evolve over a single L2HMC trajectory.

                Interpretation

                Average plaquette: ⟨xP⟩\langle x_{P}\rangle⟨xP​⟩ vs LF step

                Average energy: H−∑log⁡∣J∣H - \sum\log|\mathcal{J}|H−∑log∣J∣
                Figure 9: The trained model artifically increases the energy towards the middle of the trajectory, allowing the sampler to tunnel between isolated sectors.

                4D SU(3)SU(3)SU(3) Results

                • Distribution of log⁡∣J∣\log|\mathcal{J}|log∣J∣ over all chains, at each leapfrog step, NLFN_{\mathrm{LF}}NLF​ (=0,1,…,8= 0, 1, \ldots, 8=0,1,…,8) during training:
                Figure 10: 100 train iters
                Figure 11: 500 train iters
                Figure 12: 1000 train iters

                4D SU(3)SU(3)SU(3) Results: δUμν\delta U_{\mu\nu}δUμν​

                Figure 13: The difference in the average plaquette ∣δUμν∣2\left|\delta U_{\mu\nu}\right|^{2}∣δUμν​∣2 between the trained model and HMC

                4D SU(3)SU(3)SU(3) Results: δUμν\delta U_{\mu\nu}δUμν​

                Figure 14: The difference in the average plaquette ∣δUμν∣2\left|\delta U_{\mu\nu}\right|^{2}∣δUμν​∣2 between the trained model and HMC

                Next Steps

                • Further code development

                  • saforem2/l2hmc-qcd
                • Continue to use / test different network architectures

                  • Gauge equivariant NNs for Uμ(x)U_{\mu}(x)Uμ​(x) update
                • Continue to test different loss functions for training

                • Scaling:

                  • Lattice volume
                  • Network size
                  • Batch size
                  • # of GPUs

                Thank you!

                 

                samforeman.me

                saforem2

                @saforem2

                [email protected]

                🙏 Acknowledgements

                This research used resources of the Argonne Leadership Computing Facility,
                which is a DOE Office of Science User Facility supported under Contract DE-AC02-06CH11357.

                hits l2hmc-qcd codefactor

                arxiv arxiv

                hydra pyTorch tensorflow Weights & Biases monitoring

                Acknowledgements

                • Links:
                  • Link to github
                  • reach out!
                • References:
                  • Link to slides
                    • link to github with slides
                  • (Foreman et al. 2022; Foreman, Jin, and Osborn 2022, 2021)
                  • (Boyda et al. 2022; Shanahan et al. 2022)
                • Huge thank you to:
                  • Yannick Meurice
                  • Norman Christ
                  • Akio Tomiya
                  • Nobuyuki Matsumoto
                  • Richard Brower
                  • Luchang Jin
                  • Chulwoo Jung
                  • Peter Boyle
                  • Taku Izubuchi
                  • Denis Boyda
                  • Dan Hackett
                  • ECP-CSD group
                  • ALCF Staff + Datascience Group

                Links

                • saforem2/l2hmc-qcd

                • 📊 slides (Github: saforem2/lattice23)

                References

                • Title Slide Background (worms) animation
                  • Github: saforem2/grid-worms-animation
                • Link to HMC demo

                References

                (I don’t know why this is broken 🤷🏻‍♂️ )

                Boyda, Denis et al. 2022. “Applications of Machine Learning to Lattice Quantum Field Theory.” In Snowmass 2021. https://arxiv.org/abs/2202.05838.
                Foreman, Sam, Taku Izubuchi, Luchang Jin, Xiao-Yong Jin, James C. Osborn, and Akio Tomiya. 2022. “HMC with Normalizing Flows.” PoS LATTICE2021: 073. https://doi.org/10.22323/1.396.0073.
                Foreman, Sam, Xiao-Yong Jin, and James C. Osborn. 2021. “Deep Learning Hamiltonian Monte Carlo.” In 9th International Conference on Learning Representations. https://arxiv.org/abs/2105.03418.
                ———. 2022. “LeapfrogLayers: A Trainable Framework for Effective Topological Sampling.” PoS LATTICE2021 (May): 508. https://doi.org/10.22323/1.396.0508.
                Shanahan, Phiala et al. 2022. “Snowmass 2021 Computational Frontier CompF03 Topical Group Report: Machine Learning,” September. https://arxiv.org/abs/2209.07559.

                Extras

                Integrated Autocorrelation Time

                Figure 15: Plot of the integrated autocorrelation time for both the trained model (colored) and HMC (greyscale).

                Comparison

                (a) Trained model
                (b) Generic HMC
                Figure 16: Comparison of ⟨δQ⟩=1N∑i=kNδQi\langle \delta Q\rangle = \frac{1}{N}\sum_{i=k}^{N} \delta Q_{i}⟨δQ⟩=N1​∑i=kN​δQi​ for the trained model Figure 16 (a) vs. HMC Figure 16 (b)

                Plaquette analysis: xPx_{P}xP​

                Deviation from V→∞V\rightarrow\inftyV→∞ limit, xP∗x_{P}^{\ast}xP∗​

                Average ⟨xP⟩\langle x_{P}\rangle⟨xP​⟩, with xP∗x_{P}^{\ast}xP∗​ (dotted-lines)

                Figure 17: Plot showing how average plaquette, ⟨xP⟩\left\langle x_{P}\right\rangle⟨xP​⟩ varies over a single trajectory for models trained at different β\betaβ, with varying trajectory lengths NLFN_{\mathrm{LF}}NLF​

                Loss Function

                • Want to maximize the expected squared charge difference1: Lθ(ξ∗,ξ)=Ep(ξ)[−δQ2(ξ∗,ξ)⋅A(ξ∗∣ξ)]\begin{equation*} \mathcal{L}_{\theta}\left(\xi^{\ast}, \xi\right) = {\mathbb{E}_{p(\xi)}}\big[-\textcolor{#FA5252}{{\delta Q}}^{2} \left(\xi^{\ast}, \xi \right)\cdot A(\xi^{\ast}|\xi)\big] \end{equation*}Lθ​(ξ∗,ξ)=Ep(ξ)​[−δQ2(ξ∗,ξ)⋅A(ξ∗∣ξ)]​

                • Where:

                  • δQ\delta QδQ is the tunneling rate: δQ(ξ∗,ξ)=∣Q∗−Q∣\begin{equation*} \textcolor{#FA5252}{\delta Q}(\xi^{\ast},\xi)=\left|Q^{\ast} - Q\right| \end{equation*}δQ(ξ∗,ξ)=∣Q∗−Q∣​

                  • A(ξ∗∣ξ)A(\xi^{\ast}|\xi)A(ξ∗∣ξ) is the probability2 of accepting the proposal ξ∗\xi^{\ast}ξ∗: A(ξ∗∣ξ)=min(1,p(ξ∗)p(ξ)∣∂ξ∗∂ξT∣)\begin{equation*} A(\xi^{\ast}|\xi) = \mathrm{min}\left( 1, \frac{p(\xi^{\ast})}{p(\xi)}\left|\frac{\partial \xi^{\ast}}{\partial \xi^{T}}\right|\right) \end{equation*}A(ξ∗∣ξ)=min(1,p(ξ)p(ξ∗)​​∂ξT∂ξ∗​​)​


                1. Where ξ∗\xi^{\ast}ξ∗ is the proposed configuration (prior to Accept / Reject)↩︎

                2. And ∣∂ξ∗∂ξT∣\left|\frac{\partial \xi^{\ast}}{\partial \xi^{T}}\right|​∂ξT∂ξ∗​​ is the Jacobian of the transformation from ξ→ξ∗\xi \rightarrow \xi^{\ast}ξ→ξ∗↩︎

                  Networks 2D U(1)U(1)U(1)

                  • Stack gauge links as shape(Uμ)\left(U_{\mu}\right)(Uμ​)=[Nb, 2, Nt, Nx] ∈C\in \mathbb{C}∈C

                    xμ(n)≔[cos⁡(x),sin⁡(x)] x_{\mu}(n) ≔ \left[\cos(x), \sin(x)\right]xμ​(n):=[cos(x),sin(x)]

                    with shape(xμ)\left(x_{\mu}\right)(xμ​)= [Nb, 2, Nt, Nx, 2] ∈R\in \mathbb{R}∈R

                  • xxx-Network:

                    • ψθ:(x,v)⟶(sx, tx, qx)\psi_{\theta}: (x, v) \longrightarrow \left(s_{x},\, t_{x},\, q_{x}\right)ψθ​:(x,v)⟶(sx​,tx​,qx​)
                  • vvv-Network:

                    • φθ:(x,v)⟶(sv, tv, qv)\varphi_{\theta}: (x, v) \longrightarrow \left(s_{v},\, t_{v},\, q_{v}\right)φθ​:(x,v)⟶(sv​,tv​,qv​) ⟵\hspace{2pt}\longleftarrow⟵ lets look at this

                  vvv-Update1

                  • forward (d=+)(d = \textcolor{#FF5252}{+})(d=+):

                  Γ+:(x,v)→v′:=v⋅eε2sv−ε2[F⋅eεqv+tv]\Gamma^{\textcolor{#FF5252}{+}}: (x, v) \rightarrow v' := v \cdot e^{\frac{\varepsilon}{2} s_{v}} - \frac{\varepsilon}{2}\left[ F \cdot e^{\varepsilon q_{v}} + t_{v} \right]Γ+:(x,v)→v′:=v⋅e2ε​sv​−2ε​[F⋅eεqv​+tv​]

                  • backward (d=−)(d = \textcolor{#1A8FFF}{-})(d=−):

                  Γ−:(x,v)→v′:=e−ε2sv{v+ε2[F⋅eεqv+tv]}\Gamma^{\textcolor{#1A8FFF}{-}}: (x, v) \rightarrow v' := e^{-\frac{\varepsilon}{2} s_{v}} \left\{v + \frac{\varepsilon}{2}\left[ F \cdot e^{\varepsilon q_{v}} + t_{v} \right]\right\}Γ−:(x,v)→v′:=e−2ε​sv​{v+2ε​[F⋅eεqv​+tv​]}


                  1. Note that (Γ+)−1=Γ−\left(\Gamma^{+}\right)^{-1} = \Gamma^{-}(Γ+)−1=Γ−, i.e. Γ+[Γ−(x,v)]=Γ−[Γ+(x,v)]=(x,v)\Gamma^{+}\left[\Gamma^{-}(x, v)\right] = \Gamma^{-}\left[\Gamma^{+}(x, v)\right] = (x, v)Γ+[Γ−(x,v)]=Γ−[Γ+(x,v)]=(x,v)↩︎

                    xxx-Update

                    • forward (d=+)(d = \textcolor{#FF5252}{+})(d=+):

                    Λ+(x,v)=x⋅eε2sx−ε2[v⋅eεqx+tx]\Lambda^{\textcolor{#FF5252}{+}}(x, v) = x \cdot e^{\frac{\varepsilon}{2} s_{x}} - \frac{\varepsilon}{2}\left[ v \cdot e^{\varepsilon q_{x}} + t_{x} \right]Λ+(x,v)=x⋅e2ε​sx​−2ε​[v⋅eεqx​+tx​]

                    • backward (d=−)(d = \textcolor{#1A8FFF}{-})(d=−):

                    Λ−(x,v)=e−ε2sx{x+ε2[v⋅eεqx+tx]}\Lambda^{\textcolor{#1A8FFF}{-}}(x, v) = e^{-\frac{\varepsilon}{2} s_{x}} \left\{x + \frac{\varepsilon}{2}\left[ v \cdot e^{\varepsilon q_{x}} + t_{x} \right]\right\}Λ−(x,v)=e−2ε​sx​{x+2ε​[v⋅eεqx​+tx​]}

                    Lattice Gauge Theory (2D U(1)U(1)U(1))

                    🔗 Link Variables

                    Uμ(n)=eixμ(n)∈C,whereU_{\mu}(n) = e^{i x_{\mu}(n)}\in \mathbb{C},\quad \text{where}\quadUμ​(n)=eixμ​(n)∈C,where xμ(n)∈[−π,π)x_{\mu}(n) \in [-\pi,\pi)xμ​(n)∈[−π,π)

                    🫸 Wilson Action

                    Sβ(x)=β∑Pcos⁡xP,S_{\beta}(x) = \beta\sum_{P} \cos \textcolor{#00CCFF}{x_{P}},Sβ​(x)=βP∑​cosxP​,

                    xP=[xμ(n)+xν(n+μ^)−xμ(n+ν^)−xν(n)]\textcolor{#00CCFF}{x_{P}} = \left[x_{\mu}(n) + x_{\nu}(n+\hat{\mu}) - x_{\mu}(n+\hat{\nu})-x_{\nu}(n)\right]xP​=[xμ​(n)+xν​(n+μ^​)−xμ​(n+ν^)−xν​(n)]

                    Note: xP\textcolor{#00CCFF}{x_{P}}xP​ is the product of links around 1×11\times 11×1 square, called a “plaquette”

                    2D Lattice

                    2D Lattice

                    Figure 18: Jupyter Notebook

                    Annealing Schedule

                    • Introduce an annealing schedule during the training phase:

                      {γt}t=0N={γ0,γ1,…,γN−1,γN}\left\{ \gamma_{t} \right\}_{t=0}^{N} = \left\{\gamma_{0}, \gamma_{1}, \ldots, \gamma_{N-1}, \gamma_{N} \right\}{γt​}t=0N​={γ0​,γ1​,…,γN−1​,γN​}

                      where γ0<γ1<⋯<γN≡1\gamma_{0} < \gamma_{1} < \cdots < \gamma_{N} \equiv 1γ0​<γ1​<⋯<γN​≡1, and ∣γt+1−γt∣≪1\left|\gamma_{t+1} - \gamma_{t}\right| \ll 1∣γt+1​−γt​∣≪1

                    • Note:

                      • for ∣γt∣<1\left|\gamma_{t}\right| < 1∣γt​∣<1, this rescaling helps to reduce the height of the energy barriers ⟹\Longrightarrow⟹
                      • easier for our sampler to explore previously inaccessible regions of the phase space

                    Networks 2D U(1)U(1)U(1)

                    • Stack gauge links as shape(Uμ)\left(U_{\mu}\right)(Uμ​)=[Nb, 2, Nt, Nx] ∈C\in \mathbb{C}∈C

                      xμ(n)≔[cos⁡(x),sin⁡(x)] x_{\mu}(n) ≔ \left[\cos(x), \sin(x)\right]xμ​(n):=[cos(x),sin(x)]

                      with shape(xμ)\left(x_{\mu}\right)(xμ​)= [Nb, 2, Nt, Nx, 2] ∈R\in \mathbb{R}∈R

                    • xxx-Network:

                      • ψθ:(x,v)⟶(sx, tx, qx)\psi_{\theta}: (x, v) \longrightarrow \left(s_{x},\, t_{x},\, q_{x}\right)ψθ​:(x,v)⟶(sx​,tx​,qx​)
                    • vvv-Network:

                      • φθ:(x,v)⟶(sv, tv, qv)\varphi_{\theta}: (x, v) \longrightarrow \left(s_{v},\, t_{v},\, q_{v}\right)φθ​:(x,v)⟶(sv​,tv​,qv​)

                    Toy Example: GMM ∈R2\in \mathbb{R}^{2}∈R2

                    Figure 19

                    Physical Quantities

                    • To estimate physical quantities, we:
                      • Calculate physical observables at increasing spatial resolution
                      • Perform extrapolation to continuum limit
                    Figure 20: Increasing the physical resolution (a→0a \rightarrow 0a→0) allows us to make predictions about numerical values of physical quantities in the continuum limit.

                    Extra

                    1
                    MLMC: Machine Learning Monte Carlo Sam Foreman [email protected] ALCF 2023-07-31

                    1. Slides

                    2. Tools

                    3. Close
                    • MLMC: Machine Learning Monte Carlo
                    • MLMC: Machine Learning...
                    • Overview
                    • Markov Chain Monte Carlo (MCMC)
                    • Markov Chain Monte Carlo (MCMC)
                    • Hamiltonian Monte Carlo (HMC)
                    • Hamiltonian Monte Carlo (HMC)
                    • Leapfrog Integrator (HMC)
                    • HMC Update
                    • HMC Demo
                    • Issues with HMC
                    • Topological Freezing
                    • Can we do better?
                    • L2HMC: Generalizing the MD Update
                    • L2HMC: Leapfrog Layer
                    • L2HMC Update
                    • 4D SU(3) Model
                    • HMC: 4D SU(3)
                    • HMC: 4D SU(3)
                    • Networks 4D SU(3)
                    • Networks 4D SU(3)
                    • P-Network (pt. 1)
                    • P-Network (pt. 2)
                    • Results: 2D U(1)
                    • Interpretation
                    • Interpretation
                    • 4D SU(3) Results
                    • 4D SU(3) Results: \delta U_{\mu\nu}
                    • 4D SU(3) Results: \delta U_{\mu\nu}
                    • Next Steps
                    • Thank you!
                    • Slide 32
                    • Acknowledgements
                    • Links saforem2/l2hmc-qcd...
                    • References
                    • Extras
                    • Integrated Autocorrelation Time
                    • Comparison
                    • Plaquette analysis: x_{P}
                    • Loss Function
                    • Networks 2D U(1)
                    • v-Update1
                    • x-Update
                    • Lattice Gauge Theory (2D U(1))
                    • Figure 18: Jupyter...
                    • Annealing Schedule
                    • Networks 2D U(1)
                    • Toy Example: GMM \in \mathbb{R}^{2}
                    • Physical Quantities
                    • Extra
                    • f Fullscreen
                    • s Speaker View
                    • o Slide Overview
                    • e PDF Export Mode
                    • r Scroll View Mode
                    • b Toggle Chalkboard
                    • c Toggle Notes Canvas
                    • d Download Drawings
                    • ? Keyboard Help