KITTI Vision Benchmark Suite
The KITTI Vision Benchmark Suite is a dataset used for object detection and tracking in autonomous vehicles. -
Zero-1-to-3: Zero-shot one image to 3D object. -
Position Embedding Needs an Independent Layer Normalization
The dataset used in the paper is not explicitly described, but it is mentioned that the authors analyzed the input and output of each encoder layer in Vision Transformers (VTs)... -
DINO dataset
The DINO dataset: A large-scale vision transformer dataset -
Caltech-UCSD Birds 200
The Caltech-256 object category dataset is used for the feature extraction step, and the Omniglot dataset is used for the evaluation. -
The Middlebury dataset is a benchmark for stereo vision and 3D reconstruction. -
StereoDiffusion: Training-Free Stereo Image Generation Using Latent Diffusion Models -
XCiT: Cross-Covariance Image Transformers
Following tremendous success in natural language processing, transformers have re- -
KITTI Benchmark Suite
The KITTI benchmark suite is a large-scale dataset for 3D object detection, consisting of 7,481 training samples and 7,518 test samples. -
DeepFashion3D is a dataset for 3D garment reconstruction from single images. -
ImageNet-Compatible and CIFAR-10 datasets
The authors used the ImageNet-Compatible and CIFAR-10 datasets for targeted attack experiments. -
Reading digits in natural images with unsupervised feature learning
The paper presents a method for reading digits in natural images using unsupervised feature learning. -
Image COCO
The Image COCO 3 dataset’s image caption annotations, where we sample 4 10,000 sentences as training set and another 10,000 as test set. -
Selecting Receptive Fields in Deep Networks
The authors used the CIFAR-10 dataset for evaluating the quality of unsupervised representation learning algorithms. -
Deep Convolutional Generative Adversarial Networks
The authors used three datasets: Large-scale Scene Understanding (LSUN), Imagenet-1k, and a newly assembled Faces dataset. -
Facial Makeup Transfer Dataset
This dataset is used for facial makeup transfer. -
GraspNet-1Billion is a large-scale real-world grasping dataset containing 190 cluttered grasping scenes and 97,280 RGB-D images captured by 2 kinds of RGB-D cameras from 256...