Vision Transformers - Groups

TPC-ViT: Token Propagation Controller for Efficient Vision Transformers

Vision transformers (ViTs) have achieved promising results on a variety of Computer Vision tasks, however their quadratic complexity in the number of input tokens has limited...

Dataset
JSON

DINOv2

The dataset used in the paper is DINOv2, a vision foundation model trained on a large-scale dataset.

Dataset
JSON

SMMix

SMMix is a novel image mixing method that motivates both image and label enhancement by the model under training itself.

Dataset
JSON

SMMix: Self-Motivated Image Mixing for Vision Transformers

CutMix is a vital augmentation strategy that determines the performance and generalization ability of vision transformers (ViTs). However, the inconsistency between the mixed...

Dataset
JSON

Vision Transformers for Dense Prediction

A dataset for vision transformers

Dataset
JSON

Query-guided Attention in Vision Transformers for Localizing Objects Using a ...

Sketch-based object localization in natural images, where given a crude hand-drawn sketch of an object, the goal is to localize all the instances of the same object on the...

Dataset
JSON

DINO dataset

The DINO dataset: A large-scale vision transformer dataset

Dataset
JSON

7 datasets found