-
FashionMNIST
FashionMNIST is an image dataset of clothing items, used here to evaluate the performance of STN and P-STN models in recovering transformations and augmentations. -
Stanford 3D Semantic Parsing Dataset
The Stanford 3D semantic parsing dataset contains 3D scans from Matterport scanners in indoor areas with full annotations for various semantic classes including structural... -
ShapeNet Part Segmentation Dataset
ShapeNet part segmentation dataset contains 16,881 shapes from 16 categories, annotated with 50 parts in total, where each part category label is defined for each 3D point. -
KITTI Object Detection
The KITTI Object Detection dataset provides a comprehensive set of 2D object detection data for real-world driving scenarios, particularly focusing on car detection. -
Bosch Small Traffic Lights Dataset
The Bosch Small Traffic Lights Dataset presents a challenge for detecting small objects with partial occlusions, especially useful for localizing small objects under weak... -
MIOvision Traffic Camera Dataset (MIO-TCD)
MIOvision Traffic Camera Dataset (MIO-TCD) is the largest public benchmark for object detection in traffic surveillance images, with a vast array of annotated images for training... -
SQuAD dataset
The dataset used for training BERT consists of a concatenation of Wikipedia and BooksCorpus, specifically focused on the SQuAD task. -
PennTreebank
The PennTreebank dataset is used for language modeling, containing a large annotated corpus of English text to evaluate the task of predicting the next character or word based... -
Nottingham
The Nottingham dataset contains British and American folk tunes and is used to evaluate models' capabilities in polyphonic music modeling. -
Spherical MNIST
Spherical MNIST is constructed from the MNIST dataset by back projecting the digits into equirectangular projection with a resolution of 160x80. The digit labels are used to... -
1 Billion Word Language Model Benchmark
The 1 Billion Word Language Model Benchmark is a dataset used for measuring progress in statistical language modeling, consisting of a large collection of text data. -
Caltech-UCSD Birds 200 dataset (CUB-200)
The 2011 Caltech-UCSD Birds 200 dataset (CUB-200) contains 11,788 images of 200 different types of birds, widely used as a benchmark for text-to-image generation. -
TaoMultimodal Dataset
A large-scale dataset for multi-modal pretraining in Chinese, consisting of 3.1M image-text pairs from the mobile Taobao platform.