-
CIFAR-10, CIFAR-100, Stanford background dataset, VOC2012 dataset, Rotten Tom...
The dataset used in the paper is not explicitly described. However, it is mentioned that the authors used CIFAR-10 and CIFAR-100 datasets for image classification, and Stanford... -
PASCAL VOC 2007
Multi-label image recognition is a practical and challenging task compared to single-label image classification. -
TCC Benchmark
Temporal Colour Constancy (TCC) dataset, a benchmark for temporal color constancy -
ChestX-ray14
Chest X-rays are widely used to diagnose thoracic diseases, but the lack of detailed information about these abnormalities makes it challenging to develop accurate automated... -
3D Warehouse
A 3D warehouse dataset used to create the dataset mentioned in the paper. -
The Robustness Limits of SoTA Vision Models to Natural Variation
A dataset of more than 7 million images with controlled changes in pose, position, background, lighting color, and size. -
EquiMod: An Equivariance Module to Improve Visual Instance Discrimination
Recent self-supervised visual representation methods are closing the gap with supervised learning performance. Most of these successful methods rely on maximizing the similarity... -
KITTI Vision Benchmark Suite
The KITTI Vision Benchmark Suite is a dataset used for object detection and tracking in autonomous vehicles. -
Zero-1-to-3
Zero-1-to-3: Zero-shot one image to 3D object. -
Position Embedding Needs an Independent Layer Normalization
The dataset used in the paper is not explicitly described, but it is mentioned that the authors analyzed the input and output of each encoder layer in Vision Transformers (VTs)... -
DINO dataset
The DINO dataset: A large-scale vision transformer dataset -
Caltech-UCSD Birds 200
The Caltech-256 object category dataset is used for the feature extraction step, and the Omniglot dataset is used for the evaluation. -
Middlebury
The Middlebury dataset is a benchmark for stereo vision and 3D reconstruction. -
StereoDiffusion
StereoDiffusion: Training-Free Stereo Image Generation Using Latent Diffusion Models -
XCiT: Cross-Covariance Image Transformers
Following tremendous success in natural language processing, transformers have re-