MLMC: Machine Learning Monte Carlo
Overview
Hamiltonian Monte Carlo (HMC)
Want to (sequentially) construct a chain of states: x_{0} \rightarrow x_{1} \rightarrow x_{i} \rightarrow \cdots \rightarrow x_{N}\hspace{10pt}
such that, as N \rightarrow \infty: \left\{x_{i}, x_{i+1}, x_{i+2}, \cdots, x_{N} \right\} \xrightarrow[]{N\rightarrow\infty} p(x) \propto e^{-S(x)}
Leapfrog Integrator (HMC)
HMC Update
We build a trajectory of N_{\mathrm{LF}} leapfrog steps3 \begin{equation*} (x_{0}, v_{0})% \rightarrow (x_{1}, v_{1})\rightarrow \cdots% \rightarrow (x', v') \end{equation*}
And propose x' as the next state in our chain
\begin{align*} \textcolor{#F06292}{\Gamma}: (x, v) \textcolor{#F06292}{\rightarrow} v' &:= v - \frac{\varepsilon}{2} \partial_{x} S(x) \\ \textcolor{#FD971F}{\Lambda}: (x, v) \textcolor{#FD971F}{\rightarrow} x' &:= x + \varepsilon v \end{align*}
- We then accept / reject x' using Metropolis-Hastings criteria,
A(x'|x) = \min\left\{1, \frac{p(x')}{p(x)}\left|\frac{\partial x'}{\partial x}\right|\right\}
Issues with HMC
- What do we want in a good sampler?
- Fast mixing (small autocorrelations)
- Fast burn-in (quick convergence)
- Problems with HMC:
- Energy levels selected randomly \rightarrow slow mixing
- Cannot easily traverse low-density zones \rightarrow slow convergence
Topological Freezing
Can we do better?
- Introduce two (invertible NNs)
vNet
andxNet
4:vNet:
(x, F) \longrightarrow \left(s_{v},\, t_{v},\, q_{v}\right)
xNet:
(x, v) \longrightarrow \left(s_{x},\, t_{x},\, q_{x}\right)
- Use these (s, t, q) in the generalized MD update:
- \Gamma_{\theta}^{\pm} : ({x}, \textcolor{#07B875}{v}) \xrightarrow[]{\textcolor{#F06292}{s_{v}, t_{v}, q_{v}}} (x, \textcolor{#07B875}{v'})
- \Lambda_{\theta}^{\pm} : (\textcolor{#AE81FF}{x}, v) \xrightarrow[]{\textcolor{#FD971F}{s_{x}, t_{x}, q_{x}}} (\textcolor{#AE81FF}{x'}, v)
4D SU(3) Results
- Distribution of \log|\mathcal{J}| over all chains, at each leapfrog step, N_{\mathrm{LF}} (= 0, 1, \ldots, 8) during training:
4D SU(3) Results: \delta U_{\mu\nu}
4D SU(3) Results: \delta U_{\mu\nu}
Next Steps
Further code development
Continue to use / test different network architectures
- Gauge equivariant NNs for U_{\mu}(x) update
Continue to test different loss functions for training
Scaling:
- Lattice volume
- Network size
- Batch size
- # of GPUs
Thank you!
Acknowledgements
- Huge thank you to:
- Yannick Meurice
- Norman Christ
- Akio Tomiya
- Nobuyuki Matsumoto
- Richard Brower
- Luchang Jin
- Chulwoo Jung
- Peter Boyle
- Taku Izubuchi
- Denis Boyda
- Dan Hackett
- ECP-CSD group
- ALCF Staff + Datascience Group
Links
๐ slides (Github:
saforem2/lattice23
)
References
(I donโt know why this is broken ๐คท๐ปโโ๏ธ )
Extras
Comparison
Loss Function
Want to maximize the expected squared charge difference9: \begin{equation*} \mathcal{L}_{\theta}\left(\xi^{\ast}, \xi\right) = {\mathbb{E}_{p(\xi)}}\big[-\textcolor{#FA5252}{{\delta Q}}^{2} \left(\xi^{\ast}, \xi \right)\cdot A(\xi^{\ast}|\xi)\big] \end{equation*}
Where:
\delta Q is the tunneling rate: \begin{equation*} \textcolor{#FA5252}{\delta Q}(\xi^{\ast},\xi)=\left|Q^{\ast} - Q\right| \end{equation*}
A(\xi^{\ast}|\xi) is the probability10 of accepting the proposal \xi^{\ast}: \begin{equation*} A(\xi^{\ast}|\xi) = \mathrm{min}\left( 1, \frac{p(\xi^{\ast})}{p(\xi)}\left|\frac{\partial \xi^{\ast}}{\partial \xi^{T}}\right|\right) \end{equation*}
Networks 2D U(1)
Stack gauge links as
shape
\left(U_{\mu}\right)=[Nb, 2, Nt, Nx]
\in \mathbb{C}x_{\mu}(n) โ \left[\cos(x), \sin(x)\right]
with
shape
\left(x_{\mu}\right)= [Nb, 2, Nt, Nx, 2]
\in \mathbb{R}x-Network:
- \psi_{\theta}: (x, v) \longrightarrow \left(s_{x},\, t_{x},\, q_{x}\right)
v-Network:
- \varphi_{\theta}: (x, v) \longrightarrow \left(s_{v},\, t_{v},\, q_{v}\right) \hspace{2pt}\longleftarrow lets look at this
v-Update11
- forward (d = \textcolor{#FF5252}{+}):
\Gamma^{\textcolor{#FF5252}{+}}: (x, v) \rightarrow v' := v \cdot e^{\frac{\varepsilon}{2} s_{v}} - \frac{\varepsilon}{2}\left[ F \cdot e^{\varepsilon q_{v}} + t_{v} \right]
- backward (d = \textcolor{#1A8FFF}{-}):
\Gamma^{\textcolor{#1A8FFF}{-}}: (x, v) \rightarrow v' := e^{-\frac{\varepsilon}{2} s_{v}} \left\{v + \frac{\varepsilon}{2}\left[ F \cdot e^{\varepsilon q_{v}} + t_{v} \right]\right\}
x-Update
- forward (d = \textcolor{#FF5252}{+}):
\Lambda^{\textcolor{#FF5252}{+}}(x, v) = x \cdot e^{\frac{\varepsilon}{2} s_{x}} - \frac{\varepsilon}{2}\left[ v \cdot e^{\varepsilon q_{x}} + t_{x} \right]
- backward (d = \textcolor{#1A8FFF}{-}):
\Lambda^{\textcolor{#1A8FFF}{-}}(x, v) = e^{-\frac{\varepsilon}{2} s_{x}} \left\{x + \frac{\varepsilon}{2}\left[ v \cdot e^{\varepsilon q_{x}} + t_{x} \right]\right\}
Annealing Schedule
Introduce an annealing schedule during the training phase:
\left\{ \gamma_{t} \right\}_{t=0}^{N} = \left\{\gamma_{0}, \gamma_{1}, \ldots, \gamma_{N-1}, \gamma_{N} \right\}
where \gamma_{0} < \gamma_{1} < \cdots < \gamma_{N} \equiv 1, and \left|\gamma_{t+1} - \gamma_{t}\right| \ll 1
Note:
- for \left|\gamma_{t}\right| < 1, this rescaling helps to reduce the height of the energy barriers \Longrightarrow
- easier for our sampler to explore previously inaccessible regions of the phase space
Networks 2D U(1)
Stack gauge links as
shape
\left(U_{\mu}\right)=[Nb, 2, Nt, Nx]
\in \mathbb{C}x_{\mu}(n) โ \left[\cos(x), \sin(x)\right]
with
shape
\left(x_{\mu}\right)= [Nb, 2, Nt, Nx, 2]
\in \mathbb{R}x-Network:
- \psi_{\theta}: (x, v) \longrightarrow \left(s_{x},\, t_{x},\, q_{x}\right)
v-Network:
- \varphi_{\theta}: (x, v) \longrightarrow \left(s_{v},\, t_{v},\, q_{v}\right)
Physical Quantities
- To estimate physical quantities, we:
- Calculate physical observables at increasing spatial resolution
- Perform extrapolation to continuum limit
Extra
Footnotes
Here, \sim means โis distributed according toโโฉ๏ธ
Here, \sim means โis distributed according toโโฉ๏ธ
We always start by resampling the momentum, v_{0} \sim \mathcal{N}(0, \mathbb{1})โฉ๏ธ
For simple \mathbf{x} \in \mathbb{R}^{2} example, \mathcal{L}_{\theta} = A(\xi^{\ast}|\xi)\cdot \left(\mathbf{x}^{\ast} - \mathbf{x}\right)^{2}โฉ๏ธ
\sigma(\cdot) denotes an activation functionโฉ๏ธ
\lambda_{s}, \lambda_{q} \in \mathbb{R}, trainable parametersโฉ๏ธ
Note that \left(\Gamma^{+}\right)^{-1} = \Gamma^{-}, i.e. \Gamma^{+}\left[\Gamma^{-}(U, P)\right] = \Gamma^{-}\left[\Gamma^{+}(U, P)\right] = (U, P)โฉ๏ธ
Where \xi^{\ast} is the proposed configuration (prior to Accept / Reject)โฉ๏ธ
And \left|\frac{\partial \xi^{\ast}}{\partial \xi^{T}}\right| is the Jacobian of the transformation from \xi \rightarrow \xi^{\ast}โฉ๏ธ
Note that \left(\Gamma^{+}\right)^{-1} = \Gamma^{-}, i.e. \Gamma^{+}\left[\Gamma^{-}(x, v)\right] = \Gamma^{-}\left[\Gamma^{+}(x, v)\right] = (x, v)โฉ๏ธ