Dataset - LDM

MPCvit: Searching for MPC-Friendly Vision Transformer with Heterogeneous Atte...

MPCvit: Searching for MPC-Friendly Vision Transformer with Heterogeneous Attention
- Dataset
- JSON
S-Adapter: Generalizing Vision Transformer for Face Anti-Spoofing with Statis...

Face Anti-Spoofing (FAS) aims to detect malicious attempts to invade a face recognition system by presenting spoofed faces. State-of-the-art FAS techniques predominantly rely on...
- Dataset
- JSON
Semantic Equitable Clustering: A Simple, Fast and Effective Strategy for Visi...

The Vision Transformer (ViT) has gained prominence for its superior relational modeling prowess. However, its global attention mechanism’s quadratic complexity poses substantial...
- Dataset
- JSON
HTC-DC Net

The proposed network utilizes a classification-regression paradigm with a ViT to incorporate holistic features and local features. The regression phase with hybrid regression...
- Dataset
- JSON
PanoViT: Vision Transformer for Room Layout Estimation

Estimating room layout from a single panoramic image
- Dataset
- JSON
HSViT: Horizontally Scalable Vision Transformer

This paper introduces a horizontally scalable vision transformer (HSViT) scheme with a novel image-level feature embedding. The design of HSViT preserves the inductive bias from...
- Dataset
- JSON
Pyramid VisionLLaMA: A versatile backbone for dense prediction without convol...

Pyramid VisionLLaMA: A versatile backbone for dense prediction without convolutions.
- Dataset
- JSON
Conditional positional encodings for vision transformers

Conditional positional encodings for vision transformers.
- Dataset
- JSON
Twins: Revisiting the design of spatial attention in vision transformers

Twins: Revisiting the design of spatial attention in vision transformers.
- Dataset
- JSON
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks

VisionLLaMA is a unified and generic modeling framework for solving most vision tasks.
- Dataset
- JSON
PICMUS dataset

The dataset used for testing the proposed Tiny-VBF model, which is a vision transformer-based image reconstruction for ultrasound imaging.
- Dataset
- JSON
In-silico dataset

The dataset used for testing the proposed Tiny-VBF model, which is a vision transformer-based image reconstruction for ultrasound imaging.
- Dataset
- JSON
In-vitro dataset

The dataset used for training and testing the proposed Tiny-VBF model, which is a vision transformer-based image reconstruction for ultrasound imaging.
- Dataset
- JSON
METER: a mobile vision transformer architecture for monocular depth estimation

Monocular depth estimation is a fundamental knowledge for autonomous systems that need to assess their own state and perceive the surrounding environment.
- Dataset
- JSON
DINO dataset

The DINO dataset: A large-scale vision transformer dataset
- Dataset
- JSON
VQA 2.0

The VQA 2.0 dataset is used for visual question answering task. It consists of three sets with a train set containing 83k images and 444k questions, a validation set containing...
- Dataset
- JSON
COCO

Large scale datasets [18, 17, 27, 6] boosted text conditional image generation quality. However, in some domains it could be difficult to make such datasets and usually it could...
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

17 datasets found