Dataset - LDM

Investigating the Vision Transformer Model for Image Retrieval Tasks

The paper introduces a plug-and-play descriptor that can be effectively adopted for image retrieval tasks without prior initialization or preparation.
- Dataset
- JSON
Mask-guided Vision Transformer for Few-Shot Learning

The proposed MG-ViT model is used for few-shot learning on the Agri-ImageNet and ACFR apple detection tasks.
- Dataset
- JSON
COVID-VIT: Classification of Covid-19 from CT chest images based on vision tr...

COVID-19 classification from CT chest images based on vision transformer models
- Dataset
- JSON
Osteoarthritis Initiative (OAI) dataset

Knee OsteoArthritis (KOA) dataset used for early detection of KOA (KL-0 vs KL-2) using Vision Transformer (ViT) model with selective shuffled position embedding and key-patch...
- Dataset
- JSON
ViT-FOD: A Vision Transformer based Fine-grained Object Discriminator

Fine-grained object discrimination using Vision Transformer
- Dataset
- JSON
A Novel Vision Transformer with Residual in Self-attention for Biomedical Ima...

Biomedical image classification requires capturing of bio-informatics based on specific feature distribution. In most of such applications, there are mainly challenges due to...
- Dataset
- JSON
MVTecAD dataset

The MVTecAD dataset is an image data on 15 products.
- Dataset
- JSON
MNIST, CIFAR10, and MVTecAD datasets

The MNIST, CIFAR10, and MVTecAD datasets were used to verify the anomaly detection and localization performance of the proposed method.
- Dataset
- JSON
Shape-Sensitive Loss for Catheter and Guidewire Segmentation

A shape-sensitive loss function for catheter and guidewire segmentation using a vision transformer network.
- Dataset
- JSON
WeakTr: Exploring Plain Vision Transformer for Weakly-supervised Semantic Seg...

Weakly-supervised semantic segmentation using plain Vision Transformer (ViT) for Weakly-supervised Semantic Segmentation (WSSS).
- Dataset
- JSON
FastViT

The FastViT dataset is a fast vision transformer model.
- Dataset
- JSON
EdgeVITs

The EdgeVITs dataset is a light-weight vision transformer model.
- Dataset
- JSON
JFT-300M

The JFT-300M dataset is used for training and evaluation of the proposed Circulant Channel-Speciﬁc (CCS) token-mixing MLP.
- Dataset
- JSON
ImageNet21K

The ImageNet21K dataset is used for training and evaluation of the proposed Circulant Channel-Speciﬁc (CCS) token-mixing MLP.
- Dataset
- JSON
LAION-2B

The dataset used in the paper is LAION-2B, which is a large-scale image-text dataset. The authors fine-tune a pre-trained diffusion model with a subset of LAION-2B with 10k...
- Dataset
- JSON
Diverse instance discovery: Vision-Transformer for instance-aware multi-label...

Multi-label image recognition is a practical and challenging computer vision task. The authors propose a method to leverage the advantages of Transformer with long-range...
- Dataset
- JSON
ESC-50

The dataset used for training the CNN in cough detection is composed of various modified audio clips gathered from open-source online sources. Each of these audio files...
- Dataset
- JSON
ChestX-ray14

Chest X-rays are widely used to diagnose thoracic diseases, but the lack of detailed information about these abnormalities makes it challenging to develop accurate automated...
- Dataset
- JSON
Robustifying Vision Transformer without Retraining from Scratch

Vision Transformer (ViT) is becoming more popular in image processing. We investigate the effectiveness of test-time adaptation (TTA) on ViT, a technique that has emerged to...
- Dataset
- JSON
Structural Vision Transformer

Structural Vision Transformer (StructViT) is a vision transformer network that leverages structural self-attention (StructSA) to capture correlation structures in images and...
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

22 datasets found