FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design

doi:doi:10.57702/z4ohr5qr

FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design

Six-bit quantization can effectively reduce the size of large language models and preserve the model quality consistently across varied applications.

Data and Resources

Original MetadataJSON
The json representation of the dataset with its distributions based on DCAT.
Explore
- Preview
- Download

Cite this as

Haojun Xia, Zhen Zheng, Xiaoxia Wu, Shiyang Chen, Zhewei Yao, Stephen Youn, Arash Bakhtiari, Michael Wyatt, Yuxiong He, Olatunji Ruwase, Shuaiwen Leon Song (2025). Dataset: FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design. https://doi.org/10.57702/z4ohr5qr

DOI retrieved: January 3, 2025

Additional Info

Field	Value
Created	January 3, 2025
Last update	January 3, 2025
Defined In	https://doi.org/10.48550/arXiv.2401.14112
Author	Haojun Xia
More Authors	Zhen Zheng Xiaoxia Wu Shiyang Chen Zhewei Yao Stephen Youn Arash Bakhtiari Michael Wyatt Yuxiong He Olatunji Ruwase Shuaiwen Leon Song
Homepage	https://github.com/usyd-fsalab/fp6_llm