-
Focal Vision Transformers
Vision transformers have achieved encouraging progress in various computer vision tasks. A common belief is that this is attributed to the competence of self-attention in... -
SkelVit: Consensus of Vision Transformers for a Lightweight Skeleton-Based Ac...
Skeleton-based action recognition receives the attention of many researchers as it is robust to viewpoint and illumination changes, and its processing is much more eļ¬cient than... -
SMMix: Self-Motivated Image Mixing for Vision Transformers
CutMix is a vital augmentation strategy that determines the performance and generalization ability of vision transformers (ViTs). However, the inconsistency between the mixed... -
Vision Transformers for Dense Prediction
A dataset for vision transformers -
DINOv2: Learning robust visual features without supervision
The authors propose a method for self-supervised representation learning using knowledge distillation and vision transformers. -
Query-guided Attention in Vision Transformers for Localizing Objects Using a ...
Sketch-based object localization in natural images, where given a crude hand-drawn sketch of an object, the goal is to localize all the instances of the same object on the... -
XCiT: Cross-Covariance Image Transformers
Following tremendous success in natural language processing, transformers have re- -
An image is worth 16x16 words: Transformers for image recognition at scale
An image is worth 16x16 words: Transformers for image recognition at scale.