APTQ: Attention-aware Post-Training Mixed-Precision Quantization for Large Language Models

doi:doi:10.57702/i1dmoyg7

APTQ: Attention-aware Post-Training Mixed-Precision Quantization for Large Language Models

Large Language Models (LLMs) have greatly advanced the natural language processing paradigm. However, the high computational load and huge model sizes pose a grand challenge for deployment on edge devices. To this end, we propose APTQ (Attention-aware Post-Training Mixed-Precision Quantization) for LLMs, which considers not only the second-order information of each layer’s weights, but also, for the first time, the nonlinear effect of attention outputs on the entire model.

Data and Resources

Original MetadataJSON
The json representation of the dataset with its distributions based on DCAT.
Explore
- Preview
- Download

Cite this as

Ziyi Guan, Hantao Huang, Yupeng Su, Hong Huang, Ngai Wong, Hao Yu (2024). Dataset: APTQ: Attention-aware Post-Training Mixed-Precision Quantization for Large Language Models. https://doi.org/10.57702/i1dmoyg7

DOI retrieved: December 16, 2024

Additional Info

Field	Value
Created	December 16, 2024
Last update	December 16, 2024
Defined In	https://doi.org/10.1145/3649329.3658498
Author	Ziyi Guan
More Authors	Hantao Huang Yupeng Su Hong Huang Ngai Wong Hao Yu