-
KITTI Benchmark Suite
The KITTI benchmark suite is a large-scale dataset for 3D object detection, consisting of 7,481 training samples and 7,518 test samples. -
DeepFashion3D
DeepFashion3D is a dataset for 3D garment reconstruction from single images. -
ImageNet-Compatible and CIFAR-10 datasets
The authors used the ImageNet-Compatible and CIFAR-10 datasets for targeted attack experiments. -
Reading digits in natural images with unsupervised feature learning
The paper presents a method for reading digits in natural images using unsupervised feature learning. -
Image COCO
The Image COCO 3 dataset’s image caption annotations, where we sample 4 10,000 sentences as training set and another 10,000 as test set. -
Selecting Receptive Fields in Deep Networks
The authors used the CIFAR-10 dataset for evaluating the quality of unsupervised representation learning algorithms. -
Deep Convolutional Generative Adversarial Networks
The authors used three datasets: Large-scale Scene Understanding (LSUN), Imagenet-1k, and a newly assembled Faces dataset. -
Facial Makeup Transfer Dataset
This dataset is used for facial makeup transfer. -
GraspNet-1Billion
GraspNet-1Billion is a large-scale real-world grasping dataset containing 190 cluttered grasping scenes and 97,280 RGB-D images captured by 2 kinds of RGB-D cameras from 256... -
Grasp-Anything-6D
Grasp-Anything-6D is a large-scale dataset for language-driven 6-DoF grasp detection in 3D point clouds. It consists of 1M point cloud scenes with comprehensive object grasping... -
Spherical-MNIST, atomic energy, Shrec17, diffusion MRI
The dataset used in this paper for classification tasks on spherical-MNIST, atomic energy, Shrec17 data sets, and group testing on diffusion MRI data. -
DDP: Diffusion Model for Dense Visual Prediction
The DDP framework is a simple, efficient, and powerful framework for dense visual predictions based on conditional diffusion. -
MST: Masked Self-Supervised Transformer for Visual Representation
The proposed method is a self-supervised learning approach for visual representation learning, which can explicitly capture the local context of an image while preserving the... -
SSIMLayer: Towards Robust Deep Representation Learning via Nonlinear Structur...
The proposed SSIMLayer is a new nonlinear computational layer of high learning capacity to the deep convolutional neural network architectures. -
Total capture: A 3D deformation model for tracking faces, hands, and bodies
Dataset for tracking faces, hands, and bodies in videos. -
Video based reconstruction of 3D people models
Real-world dataset for reconstructing 3D people models from monocular video. -
HR Human: Modeling Human Avatars with Triangular Mesh and High-Resolution Tex...
Real-world datasets for reconstructing human avatars from monocular video, including ZJU-MoCap and People-Snapshot.