2023-07-31
MLMC: Machine Learning Monte Carlo
for Lattice Gauge Theory
Sam Foreman
Xiao-Yong Jin, James C. Osborn
saforem2/
{
lattice23
, l2hmc-qcd
}
2023-07-31 @ Lattice 2023
π― Goal
Generate independent samples \{x_{i}\}, such that1 \{x_{i}\} \sim p(x) \propto e^{-S(x)} where S(x) is the action (or potential energy)
If these were independent, we could approximate: \left\langle\mathcal{O}\right\rangle \simeq \frac{1}{N}\sum^{N}_{n=1}\mathcal{O}(x_{n})
\sigma_{\mathcal{O}}^{2} = \frac{1}{N}\mathrm{Var}{\left[\mathcal{O} (x)
\right]}\Longrightarrow \sigma_{\mathcal{O}} \propto \frac{1}{\sqrt{N}}
π― Goal
Generate independent samples \{x_{i}\}, such that1 \{x_{i}\} \sim p(x) \propto e^{-S(x)} where S(x) is the action (or potential energy)
Instead, nearby configs are correlated, and we incur a factor of \textcolor{#FF5252}{\tau^{\mathcal{O}}_{\mathrm{int}}}: \sigma_{\mathcal{O}}^{2} = \frac{\textcolor{#FF5252}{\tau^{\mathcal{O}}_{\mathrm{int}}}}{N}\mathrm{Var}{\left[\mathcal{O} (x) \right]}
Want to (sequentially) construct a chain of states: x_{0} \rightarrow x_{1} \rightarrow x_{i} \rightarrow \cdots \rightarrow x_{N}\hspace{10pt}
such that, as N \rightarrow \infty: \left\{x_{i}, x_{i+1}, x_{i+2}, \cdots, x_{N} \right\} \xrightarrow[]{N\rightarrow\infty} p(x) \propto e^{-S(x)}
πͺ Trick
Idea: Evolve the (\dot{x}, \dot{v}) system to get new states \{x_{i}\}β
Write the joint distribution p(x, v): p(x, v) \propto e^{-S[x]} e^{-\frac{1}{2}v^{T} v} = e^{-H(x, v)}
π Hamiltonian Dynamics
H = S[x] + \frac{1}{2} v^{T} v \Longrightarrow \dot{x} = +\partial_{v} H, \,\,\dot{v} = -\partial_{x} H
π Hamiltonian Dynamics
\left(\dot{x}, \dot{v}\right) = \left(\partial_{v} H, -\partial_{x} H\right)
πΈ Leapfrog Step
input
\,\left(x, v\right) \rightarrow \left(x', v'\right)\, output
\begin{align*} \tilde{v} &:= \textcolor{#F06292}{\Gamma}(x, v)\hspace{2.2pt} = v - \frac{\varepsilon}{2} \partial_{x} S(x) \\ x' &:= \textcolor{#FD971F}{\Lambda}(x, \tilde{v}) \, = x + \varepsilon \, \tilde{v} \\ v' &:= \textcolor{#F06292}{\Gamma}(x', \tilde{v}) = \tilde{v} - \frac{\varepsilon}{2} \partial_{x} S(x') \end{align*}
β οΈ Warning!
Note: \partial_{x} S(x) is the force
We build a trajectory of N_{\mathrm{LF}} leapfrog steps1 \begin{equation*} (x_{0}, v_{0})% \rightarrow (x_{1}, v_{1})\rightarrow \cdots% \rightarrow (x', v') \end{equation*}
And propose x' as the next state in our chain
\begin{align*} \textcolor{#F06292}{\Gamma}: (x, v) \textcolor{#F06292}{\rightarrow} v' &:= v - \frac{\varepsilon}{2} \partial_{x} S(x) \\ \textcolor{#FD971F}{\Lambda}: (x, v) \textcolor{#FD971F}{\rightarrow} x' &:= x + \varepsilon v \end{align*}
Topological Charge: Q = \frac{1}{2\pi}\sum_{P}\left\lfloor x_{P}\right\rfloor \in \mathbb{Z}
note: \left\lfloor x_{P} \right\rfloor = x_{P} - 2\pi \left\lfloor\frac{x_{P} + \pi}{2\pi}\right\rfloor
β³ Critical Slowing Down
vNet
and xNet
1:
vNet:
(x, F) \longrightarrow \left(s_{v},\, t_{v},\, q_{v}\right)xNet:
(x, v) \longrightarrow \left(s_{x},\, t_{x},\, q_{x}\right)
Introduce d \sim \mathcal{U}(\pm) to determine the direction of our update
\textcolor{#07B875}{v'} = \Gamma^{\pm}({x}, \textcolor{#07B875}{v}) \hspace{46pt} update v
\textcolor{#AE81FF}{x'} = x_{B}\,+\,\Lambda^{\pm}(x_{A}, {v'}) \hspace{10pt} update first half: x_{A}
\textcolor{#AE81FF}{x''} = x'_{A}\,+\,\Lambda^{\pm}(x'_{B}, {v'}) \hspace{8pt} update other half: x_{B}
\textcolor{#07B875}{v''} = \Gamma^{\pm}({x''}, \textcolor{#07B875}{v'}) \hspace{36pt} update v
π¨βπ» Algorithm
input
: x
forward
: Generate proposal \xi' by passing initial \xi through N_{\mathrm{LF}} leapfrog layers
\textcolor{#939393} \xi \hspace{1pt}\xrightarrow[]{\tiny{\mathrm{LF} \text{ layer}}}\xi_{1} \longrightarrow\cdots \longrightarrow \xi_{N_{\mathrm{LF}}} = \textcolor{#f8f8f8}{\xi'} := (\textcolor{#AE81FF}{x''}, \textcolor{#07B875}{v''})
backward
(if training):
return
: \textcolor{#AE81FF}{x}_{i+1}
Evaluate MH criteria (1) and return accepted config, \textcolor{#AE81FF}{{x}_{i+1}}\gets
\begin{cases}
\textcolor{#f8f8f8}{\textcolor{#AE81FF}{x''}} \small{\text{ w/ prob }} A(\textcolor{#f8f8f8}{\xi''}|\textcolor{#939393}{\xi}) \hspace{26pt} β
\\
\textcolor{#939393}{\textcolor{#AE81FF}{x}} \hspace{5pt}\small{\text{ w/ prob }} 1 - A(\textcolor{#f8f8f8}{\xi''}|{\textcolor{#939393}{\xi}}) \hspace{10pt} π«
\end{cases}
π Link Variables
Write link variables U_{\mu}(x) \in SU(3):
\begin{align*} U_{\mu}(x) &= \mathrm{exp}\left[{i\, \textcolor{#AE81FF}{\omega^{k}_{\mu}(x)} \lambda^{k}}\right]\\ &= e^{i \textcolor{#AE81FF}{Q}},\quad \text{with} \quad \textcolor{#AE81FF}{Q} \in \mathfrak{su}(3) \end{align*}
where \omega^{k}_{\mu}(x) \in \mathbb{R}, and \lambda^{k} are the generators of SU(3)
πββοΈββ‘οΈ Conjugate Momenta
π₯ Wilson Action
S_{G} = -\frac{\beta}{6} \sum \mathrm{Tr}\left[U_{\mu\nu}(x) + U^{\dagger}_{\mu\nu}(x)\right]
where U_{\mu\nu}(x) = U_{\mu}(x) U_{\nu}(x+\hat{\mu}) U^{\dagger}_{\mu}(x+\hat{\nu}) U^{\dagger}_{\nu}(x)
Hamiltonian: H[P, U] = \frac{1}{2} P^{2} + S[U] \Longrightarrow
Momentum Update: \textcolor{#F06292}{\Gamma}: P \longrightarrow P' := P - \frac{\varepsilon}{2} F[U]
Link Update: \textcolor{#FD971F}{\Lambda}: U \longrightarrow U' := e^{i\varepsilon P'} U\quad\quad
We maintain a batch of Nb
lattices, all updated in parallel
.dtype = complex128
.shape
= [Nb, 4, Nt, Nx, Ny, Nz, 3, 3]
Network
(pt. 1)input
1: \hspace{7pt}\left(U, F\right) := (e^{i Q}, F) \begin{align*}
h_{0} &= \sigma\left( w_{Q} Q + w_{F} F + b \right) \\
h_{1} &= \sigma\left( w_{1} h_{0} + b_{1} \right) \\
&\vdots \\
h_{n} &= \sigma\left(w_{n-1} h_{n-2} + b_{n}\right) \\
\textcolor{#FF5252}{z} & := \sigma\left(w_{n} h_{n-1} + b_{n}\right) \longrightarrow \\
\end{align*}output
2: \hspace{7pt} (s_{P}, t_{P}, q_{P})
Network
(pt. 2)Use (s_{P}, t_{P}, q_{P}) to update \Gamma^{\pm}: (U, P) \rightarrow \left(U, P_{\pm}\right)1:
forward (d = \textcolor{#FF5252}{+}): \Gamma^{\textcolor{#FF5252}{+}}(U, P) := P_{\textcolor{#FF5252}{+}} = P \cdot e^{\frac{\varepsilon}{2} s_{P}} - \frac{\varepsilon}{2}\left[ F \cdot e^{\varepsilon q_{P}} + t_{P} \right]
backward (d = \textcolor{#1A8FFF}{-}): \Gamma^{\textcolor{#1A8FFF}{-}}(U, P) := P_{\textcolor{#1A8FFF}{-}} = e^{-\frac{\varepsilon}{2} s_{P}} \left\{P + \frac{\varepsilon}{2}\left[ F \cdot e^{\varepsilon q_{P}} + t_{P} \right]\right\}
π Improvement
We can measure the performance by comparing \tau_{\mathrm{int}} for the trained model vs. HMC.
Note: lower is better
Deviation in x_{P}
Topological charge mixing
Artificial influx of energy
Further code development
Continue to use / test different network architectures
Continue to test different loss functions for training
Scaling:
π Acknowledgements
This research used resources of the Argonne Leadership Computing Facility,
which is a DOE Office of Science User Facility supported under Contract DE-AC02-06CH11357.
π slides (Github: saforem2/lattice23
)
(I donβt know why this is broken π€·π»ββοΈ )
Deviation from V\rightarrow\infty limit, x_{P}^{\ast}
Average \langle x_{P}\rangle, with x_{P}^{\ast} (dotted-lines)
Want to maximize the expected squared charge difference1: \begin{equation*} \mathcal{L}_{\theta}\left(\xi^{\ast}, \xi\right) = {\mathbb{E}_{p(\xi)}}\big[-\textcolor{#FA5252}{{\delta Q}}^{2} \left(\xi^{\ast}, \xi \right)\cdot A(\xi^{\ast}|\xi)\big] \end{equation*}
Where:
\delta Q is the tunneling rate: \begin{equation*} \textcolor{#FA5252}{\delta Q}(\xi^{\ast},\xi)=\left|Q^{\ast} - Q\right| \end{equation*}
A(\xi^{\ast}|\xi) is the probability2 of accepting the proposal \xi^{\ast}: \begin{equation*} A(\xi^{\ast}|\xi) = \mathrm{min}\left( 1, \frac{p(\xi^{\ast})}{p(\xi)}\left|\frac{\partial \xi^{\ast}}{\partial \xi^{T}}\right|\right) \end{equation*}
Stack gauge links as shape
\left(U_{\mu}\right)=[Nb, 2, Nt, Nx]
\in \mathbb{C}
x_{\mu}(n) β \left[\cos(x), \sin(x)\right]
with shape
\left(x_{\mu}\right)= [Nb, 2, Nt, Nx, 2]
\in \mathbb{R}
x-Network:
v-Network:
\Gamma^{\textcolor{#FF5252}{+}}: (x, v) \rightarrow v' := v \cdot e^{\frac{\varepsilon}{2} s_{v}} - \frac{\varepsilon}{2}\left[ F \cdot e^{\varepsilon q_{v}} + t_{v} \right]
\Gamma^{\textcolor{#1A8FFF}{-}}: (x, v) \rightarrow v' := e^{-\frac{\varepsilon}{2} s_{v}} \left\{v + \frac{\varepsilon}{2}\left[ F \cdot e^{\varepsilon q_{v}} + t_{v} \right]\right\}
\Lambda^{\textcolor{#FF5252}{+}}(x, v) = x \cdot e^{\frac{\varepsilon}{2} s_{x}} - \frac{\varepsilon}{2}\left[ v \cdot e^{\varepsilon q_{x}} + t_{x} \right]
\Lambda^{\textcolor{#1A8FFF}{-}}(x, v) = e^{-\frac{\varepsilon}{2} s_{x}} \left\{x + \frac{\varepsilon}{2}\left[ v \cdot e^{\varepsilon q_{x}} + t_{x} \right]\right\}
π Link Variables
U_{\mu}(n) = e^{i x_{\mu}(n)}\in \mathbb{C},\quad \text{where}\quad x_{\mu}(n) \in [-\pi,\pi)
π«Έ Wilson Action
S_{\beta}(x) = \beta\sum_{P} \cos \textcolor{#00CCFF}{x_{P}},
\textcolor{#00CCFF}{x_{P}} = \left[x_{\mu}(n) + x_{\nu}(n+\hat{\mu}) - x_{\mu}(n+\hat{\nu})-x_{\nu}(n)\right]
Note: \textcolor{#00CCFF}{x_{P}} is the product of links around 1\times 1 square, called a βplaquetteβ
Introduce an annealing schedule during the training phase:
\left\{ \gamma_{t} \right\}_{t=0}^{N} = \left\{\gamma_{0}, \gamma_{1}, \ldots, \gamma_{N-1}, \gamma_{N} \right\}
where \gamma_{0} < \gamma_{1} < \cdots < \gamma_{N} \equiv 1, and \left|\gamma_{t+1} - \gamma_{t}\right| \ll 1
Note:
Stack gauge links as shape
\left(U_{\mu}\right)=[Nb, 2, Nt, Nx]
\in \mathbb{C}
x_{\mu}(n) β \left[\cos(x), \sin(x)\right]
with shape
\left(x_{\mu}\right)= [Nb, 2, Nt, Nx, 2]
\in \mathbb{R}
x-Network:
v-Network: