๐ŸŽฐ Deterministic flash-attn

AuroraGPT
Author
Affiliation
Published

June 17, 2024

Modified

November 18, 2024

[NOTE]: For additional details, refer to the W&B Report.

Simple tests to confirm the loss is exactly reproducible across independent runs (when launched with the same seed).

Figure 1: Plot of the loss curve for 3 independent runs with deterministic=True

Citation

BibTeX citation:
@online{foreman2024,
  author = {Foreman, Sam},
  title = {๐ŸŽฐ {Deterministic} `Flash-Attn`},
  date = {2024-06-17},
  url = {https://samforeman.me/posts/AuroraGPT/determinstic-flash-attn/},
  langid = {en}
}
For attribution, please cite this work as:
Foreman, Sam. 2024. โ€œ๐ŸŽฐ Deterministic `Flash-Attn`.โ€ June 17, 2024. https://samforeman.me/posts/AuroraGPT/determinstic-flash-attn/.