2023-07-31
MLMC: Machine Learning Monte Carlo
for Lattice Gauge Theory
Sam Foreman
Xiao-Yong Jin, James C. Osborn
saforem2/
{
lattice23
, l2hmc-qcd
}
2023-07-31 @ Lattice 2023
🎯 Goal
Generate independent samples {xi}, such that1 {xi}∼p(x)∝e−S(x) where S(x) is the action (or potential energy)
If these were independent, we could approximate: ⟨O⟩≃N1∑n=1NO(xn)
σO2=N1Var[O(x)]⟹σO∝N1
🎯 Goal
Generate independent samples {xi}, such that1 {xi}∼p(x)∝e−S(x) where S(x) is the action (or potential energy)
Instead, nearby configs are correlated, and we incur a factor of τintO: σO2=NτintOVar[O(x)]
Want to (sequentially) construct a chain of states: x0→x1→xi→⋯→xN
such that, as N→∞: {xi,xi+1,xi+2,⋯,xN}N→∞p(x)∝e−S(x)
🪄 Trick
Idea: Evolve the (x˙,v˙) system to get new states {xi}❗
Write the joint distribution p(x,v): p(x,v)∝e−S[x]e−21vTv=e−H(x,v)
🔋 Hamiltonian Dynamics
H=S[x]+21vTv⟹ x˙=+∂vH,v˙=−∂xH
Figure 1: Overview of HMC algorithm
Introduce d∼U(±) to determine the direction of our update
v′= Γ±(x,v) update v
x′= xB+Λ±(xA,v′) update first half: xA
x′′= xA′+Λ±(xB′,v′) update other half: xB
v′′= Γ±(x′′,v′) update v
👨💻 Algorithm
input
: x
forward
: Generate proposal ξ′ by passing initial ξ through NLF leapfrog layers
ξLF layerξ1⟶⋯⟶ξNLF=ξ′:=(x′′,v′′)
backward
(if training):
return
: xi+1
Evaluate MH criteria (1) and return accepted config, xi+1←{x′′ w/ prob A(ξ′′∣ξ)✅x w/ prob 1−A(ξ′′∣ξ)🚫
🔗 Link Variables
Write link variables Uμ(x)∈SU(3):
Uμ(x)=exp[iωμk(x)λk]=eiQ,withQ∈su(3)
where ωμk(x) ∈R, and λk are the generators of SU(3)
🏃♂️➡️ Conjugate Momenta
🟥 Wilson Action
SG=−6β∑Tr[Uμν(x)+Uμν†(x)]
where Uμν(x)=Uμ(x)Uν(x+μ^)Uμ†(x+ν^)Uν†(x)
Hamiltonian: H[P,U]=21P2+S[U]⟹
Network
(pt. 1)input
1: (U,F):=(eiQ,F) h0h1hnz=σ(wQQ+wFF+b)=σ(w1h0+b1)⋮=σ(wn−1hn−2+bn):=σ(wnhn−1+bn)⟶output
2: (sP,tP,qP)
Network
(pt. 2)Use (sP,tP,qP) to update Γ±:(U,P)→(U,P±)1:
forward (d=+): Γ+(U,P):=P+=P⋅e2εsP−2ε[F⋅eεqP+tP]
backward (d=−): Γ−(U,P):=P−=e−2εsP{P+2ε[F⋅eεqP+tP]}
📈 Improvement
We can measure the performance by comparing τint for the trained model vs. HMC.
Note: lower is better
Deviation in xP
Topological charge mixing
Artificial influx of energy
Further code development
Continue to use / test different network architectures
Continue to test different loss functions for training
Scaling:
🙏 Acknowledgements
This research used resources of the Argonne Leadership Computing Facility,
which is a DOE Office of Science User Facility supported under Contract DE-AC02-06CH11357.
📊 slides (Github: saforem2/lattice23
)
(I don’t know why this is broken 🤷🏻♂️ )
Deviation from V→∞ limit, xP∗
Average ⟨xP⟩, with xP∗ (dotted-lines)
Want to maximize the expected squared charge difference1: Lθ(ξ∗,ξ)=Ep(ξ)[−δQ2(ξ∗,ξ)⋅A(ξ∗∣ξ)]
Where:
δQ is the tunneling rate: δQ(ξ∗,ξ)=∣Q∗−Q∣
A(ξ∗∣ξ) is the probability2 of accepting the proposal ξ∗: A(ξ∗∣ξ)=min(1,p(ξ)p(ξ∗)∂ξT∂ξ∗)
Stack gauge links as shape
(Uμ)=[Nb, 2, Nt, Nx]
∈C
xμ(n):=[cos(x),sin(x)]
with shape
(xμ)= [Nb, 2, Nt, Nx, 2]
∈R
x-Network:
v-Network:
Γ+:(x,v)→v′:=v⋅e2εsv−2ε[F⋅eεqv+tv]
Γ−:(x,v)→v′:=e−2εsv{v+2ε[F⋅eεqv+tv]}
Λ+(x,v)=x⋅e2εsx−2ε[v⋅eεqx+tx]
Λ−(x,v)=e−2εsx{x+2ε[v⋅eεqx+tx]}
Introduce an annealing schedule during the training phase:
{γt}t=0N={γ0,γ1,…,γN−1,γN}
where γ0<γ1<⋯<γN≡1, and ∣γt+1−γt∣≪1
Note:
Stack gauge links as shape
(Uμ)=[Nb, 2, Nt, Nx]
∈C
xμ(n):=[cos(x),sin(x)]
with shape
(xμ)= [Nb, 2, Nt, Nx, 2]
∈R
x-Network:
v-Network:
Figure 19