FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design
Six-bit quantization can effectively reduce the size of large language models and preserve the model quality consistently across varied applications.
BibTex: