FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design

doi:doi:10.57702/z4ohr5qr

FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design

Six-bit quantization can effectively reduce the size of large language models and preserve the model quality consistently across varied applications.

BibTex:

@dataset{Haojun_Xia_and_Zhen_Zheng_and_Xiaoxia_Wu_and_Shiyang_Chen_and_Zhewei_Yao_and_Stephen_Youn_and_Arash_Bakhtiari_and_Michael_Wyatt_and_Yuxiong_He_and_Olatunji_Ruwase_and_Shuaiwen_Leon_Song_2025,
    abstract = {Six-bit quantization can effectively reduce the size of large language models and preserve the model quality consistently across varied applications.},
    author = {Haojun Xia and Zhen Zheng and Xiaoxia Wu and Shiyang Chen and Zhewei Yao and Stephen Youn and Arash Bakhtiari and Michael Wyatt and Yuxiong He and Olatunji Ruwase and Shuaiwen Leon Song},
    doi = {10.57702/z4ohr5qr},
    institution = {No Organization},
    keyword = {'6-bit quantization', 'FP6', 'Large Language Models', 'SIMT Cores', 'Tensor Cores'},
    month = {jan},
    publisher = {TIB},
    title = {FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design},
    url = {https://service.tib.eu/ldmservice/dataset/fp6-llm--efficiently-serving-large-language-models-through-fp6-centric-algorithm-system-co-design},
    year = {2025}
}