-
MPCvit: Searching for MPC-Friendly Vision Transformer with Heterogeneous Atte...
MPCvit: Searching for MPC-Friendly Vision Transformer with Heterogeneous Attention -
S-Adapter: Generalizing Vision Transformer for Face Anti-Spoofing with Statis...
Face Anti-Spoofing (FAS) aims to detect malicious attempts to invade a face recognition system by presenting spoofed faces. State-of-the-art FAS techniques predominantly rely on... -
Semantic Equitable Clustering: A Simple, Fast and Effective Strategy for Visi...
The Vision Transformer (ViT) has gained prominence for its superior relational modeling prowess. However, its global attention mechanism’s quadratic complexity poses substantial... -
HTC-DC Net
The proposed network utilizes a classification-regression paradigm with a ViT to incorporate holistic features and local features. The regression phase with hybrid regression... -
PanoViT: Vision Transformer for Room Layout Estimation
Estimating room layout from a single panoramic image -
HSViT: Horizontally Scalable Vision Transformer
This paper introduces a horizontally scalable vision transformer (HSViT) scheme with a novel image-level feature embedding. The design of HSViT preserves the inductive bias from... -
Pyramid VisionLLaMA: A versatile backbone for dense prediction without convol...
Pyramid VisionLLaMA: A versatile backbone for dense prediction without convolutions. -
Conditional positional encodings for vision transformers
Conditional positional encodings for vision transformers. -
Twins: Revisiting the design of spatial attention in vision transformers
Twins: Revisiting the design of spatial attention in vision transformers. -
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks
VisionLLaMA is a unified and generic modeling framework for solving most vision tasks. -
PICMUS dataset
The dataset used for testing the proposed Tiny-VBF model, which is a vision transformer-based image reconstruction for ultrasound imaging. -
In-silico dataset
The dataset used for testing the proposed Tiny-VBF model, which is a vision transformer-based image reconstruction for ultrasound imaging. -
In-vitro dataset
The dataset used for training and testing the proposed Tiny-VBF model, which is a vision transformer-based image reconstruction for ultrasound imaging. -
METER: a mobile vision transformer architecture for monocular depth estimation
Monocular depth estimation is a fundamental knowledge for autonomous systems that need to assess their own state and perceive the surrounding environment. -
DINO dataset
The DINO dataset: A large-scale vision transformer dataset