You're currently viewing an old version of this dataset. To see the current version, click here.

FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design

Six-bit quantization can effectively reduce the size of large language models and preserve the model quality consistently across varied applications.

Data and Resources

This dataset has no data

Cite this as

Haojun Xia, Zhen Zheng, Xiaoxia Wu, Shiyang Chen, Zhewei Yao, Stephen Youn, Arash Bakhtiari, Michael Wyatt, Yuxiong He, Olatunji Ruwase, Shuaiwen Leon Song (2025). Dataset: FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design. https://doi.org/10.57702/z4ohr5qr

Private DOI This DOI is not yet resolvable.
It is available for use in manuscripts, and will be published when the Dataset is made public.

Additional Info

Field Value
Created January 3, 2025
Last update January 3, 2025
Defined In https://doi.org/10.48550/arXiv.2401.14112
Author Haojun Xia
More Authors
Zhen Zheng
Xiaoxia Wu
Shiyang Chen
Zhewei Yao
Stephen Youn
Arash Bakhtiari
Michael Wyatt
Yuxiong He
Olatunji Ruwase
Shuaiwen Leon Song
Homepage https://github.com/usyd-fsalab/fp6_llm