๐ฐ Deterministic flash-attn
[NOTE]: For additional details, refer to the W&B Report.
Simple tests to confirm the loss is exactly reproducible across independent runs (when launched with the same seed).
In particular, we set:
in all the
flash_attn_func(...)
calls frommegatron/model/transformer.py
All experiments ran on Polaris @ ALCF, using:
Citation
BibTeX citation:
@online{foreman2024,
author = {Foreman, Sam},
title = {๐ฐ {Deterministic} `Flash-Attn`},
date = {2024-06-17},
url = {https://samforeman.me/posts/AuroraGPT/determinstic-flash-attn/},
langid = {en}
}
For attribution, please cite this work as:
Foreman, Sam. 2024. โ๐ฐ Deterministic `Flash-Attn`.โ June
17, 2024. https://samforeman.me/posts/AuroraGPT/determinstic-flash-attn/.