FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design

Six-bit quantization can effectively reduce the size of large language models and preserve the model quality consistently across varied applications.

Data and Resources

Cite this as

Haojun Xia, Zhen Zheng, Xiaoxia Wu, Shiyang Chen, Zhewei Yao, Stephen Youn, Arash Bakhtiari, Michael Wyatt, Yuxiong He, Olatunji Ruwase, Shuaiwen Leon Song (2025). Dataset: FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design. https://doi.org/10.57702/z4ohr5qr

DOI retrieved: January 3, 2025

Additional Info

Field Value
Created January 3, 2025
Last update January 3, 2025
Defined In https://doi.org/10.48550/arXiv.2401.14112
Author Haojun Xia
More Authors
Zhen Zheng
Xiaoxia Wu
Shiyang Chen
Zhewei Yao
Stephen Youn
Arash Bakhtiari
Michael Wyatt
Yuxiong He
Olatunji Ruwase
Shuaiwen Leon Song
Homepage https://github.com/usyd-fsalab/fp6_llm