AuroraGPT: General purpose scientific LLM
Broadly trained on a general corpora plus scientific {papers, texts, data}
Racks | 166 |
Nodes | 10,624 |
CPUs | 21,248 |
GPUs | 63,744 |
NICs | 84,992 |
HBM | 8 PB |
DDR5c | 10 PB |
Up to 25\times improvement for genomic foundation models with 6.5\times energy efficiency
Planning
Data
Training
Evaluation
Post
Inference
Comms
Distribution
✅ Goals
❌ Challenges
argonne-lcf/Megatron-DeepSpeed
saforem2/ezpz
🙏 Acknowledgements
This research used resources of the Argonne Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC02-06CH11357.