Semantic Equitable Clustering: A Simple, Fast and Effective Strategy for Vision Transformer

doi:doi:10.57702/yharvnmc

Semantic Equitable Clustering: A Simple, Fast and Effective Strategy for Vision Transformer

The Vision Transformer (ViT) has gained prominence for its superior relational modeling prowess. However, its global attention mechanism’s quadratic complexity poses substantial computational burdens. A common remedy spatially groups tokens for self-attention, reducing computational requirements. Nonethe-less, this strategy neglects semantic information in tokens, possibly scattering semantically-linked tokens across distinct groups, thus compromising the efficacy of self-attention intended for modeling inter-token dependencies. Motivated by these insights, we introduce a fast and balanced clustering method, named Semantic Equitable Clustering (SEC). SEC clusters tokens based on their global semantic relevance in an efficient, straightforward manner.

Data and Resources

Original MetadataJSON
The json representation of the dataset with its distributions based on DCAT.
Explore
- Preview
- Download

Cite this as

Qihang Fan, Huaibo Huang, Mingrui Chen, Ran He (2024). Dataset: Semantic Equitable Clustering: A Simple, Fast and Effective Strategy for Vision Transformer. https://doi.org/10.57702/yharvnmc

DOI retrieved: December 16, 2024

Additional Info

Field	Value
Created	December 16, 2024
Last update	December 16, 2024
Author	Qihang Fan
More Authors	Huaibo Huang Mingrui Chen Ran He
Homepage	https://github.com/qhfan/SecViT