AuroraGPT: General purpose scientific LLM Broadly trained on a general corpora plus scientific {papers, texts, data}
Awesome-LLM
🚂 Training
 argonne-lcf/Megatron-DeepSpeed
Large Model Training: Any Scale, Any Acclerator
🏃♂️ Running
 argonne-lcf/inference-endpoints
Inference endpoints for LLMs, hosted @ ALCF
| Racks | 166 | 
| Nodes | 10,624 | 
| CPUs | 21,248 | 
| GPUs | 63,744 | 
| NICs | 84,992 | 
| HBM | 8 PB | 
| DDR5c | 10 PB | 
We need our implementation1 to be:
CUDA, ROCm, XPU, CPU, MPS, …)This is incredibly difficult in practice, due in part to:
The original implementation was slow:
🔭 LLMs for Science
ChatGPT: explain this image 
~ 4 EFLOPS @ Aurora
38,400 XPUs
= 3200 [node] x 12 [XPU / node]
🔔 Gordon Bell Finalist1:
SEQ_LEN for both 25B and 33B models (See: Song et al. (2023))
Megatron-DeepSpeedezpz🙏 Acknowledgements
This research used resources of the Argonne Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC02-06CH11357.
