-
OBoW: Online Bag-of-Visual-Words Generation for Self-Supervised Learning
The dataset used in the paper is not explicitly described, but it is mentioned that the authors used the ImageNet, Places205, and VOC07 datasets for evaluation. -
Trypophobia dataset
Dataset used for training and testing Convolutional Neural Networks for detecting trypophobia triggers. -
COCO Keypoint Benchmark
The COCO keypoint benchmark is a widely used dataset for human pose estimation. -
Context-and-Spatial Aware Network for Multi-Person Pose Estimation
Multi-person pose estimation is a fundamental yet challenging task in computer vision. Both rich context information and spatial information are required to precisely locate the... -
Faces Dataset
The dataset used in the paper for testing the GaMeS model, containing images of six people, generated using Blender software from various perspectives, excluding the backs of... -
Mip-NeRF360 dataset
The dataset used in the paper for testing the GaMeS model, containing 5 outdoor and 4 indoor scenes, each featuring intricate central objects or areas against detailed backgrounds. -
LSUN Bedroom and LSUN Cat dataset
The LSUN Bedroom and LSUN Cat dataset is a large-scale image dataset used for training and testing the proposed approach. -
Exploring Advances in Transformers and CNN for Skin Lesion Diagnosis on Small...
Skin cancer is one of the most common types of cancer in the world. Different computer-aided diagnosis systems have been proposed to tackle skin lesion diagnosis, most of them... -
Vision Big Bird
Vision Big Bird: Random Sparsification for Full Attention -
ScanNet-v2
Learning from bounding-boxes annotations has shown great potential in weakly-supervised 3D point cloud instance segmentation. However, we observed that existing methods would... -
Youtube Faces (YTF)
The Youtube Faces (YTF) dataset contains 3,424 videos belonging to 1,595 different identities. -
Labeled Face in the Wild (LFW)
The Labeled Face in the Wild (LFW) dataset contains 13,233 facial images belonging to 5,749 different individuals. -
CIFAR-10, Tiny ImageNet, and ImageNet
The dataset used in the paper is CIFAR-10, Tiny ImageNet, and ImageNet. -
REalistic Single Image DEhazing (RESIDE) dataset
The RESIDE dataset is a large-scale dataset for benchmarking single image dehazing algorithms, and it includes both indoor and outdoor hazy images. -
Pyramid VisionLLaMA: A versatile backbone for dense prediction without convol...
Pyramid VisionLLaMA: A versatile backbone for dense prediction without convolutions. -
Conditional positional encodings for vision transformers
Conditional positional encodings for vision transformers. -
Twins: Revisiting the design of spatial attention in vision transformers
Twins: Revisiting the design of spatial attention in vision transformers. -
Mobilevlm: A fast, reproducible and strong vision language assistant for mobi...
Mobilevlm: A fast, reproducible and strong vision language assistant for mobile devices. -
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks
VisionLLaMA is a unified and generic modeling framework for solving most vision tasks.