-
Visual Question Answering (VQA)
The VQA dataset consists of 248,349 training questions, 121,512 validation questions and 244,302 testing questions, generated on a total of 123,287 images. -
VQAv2 dataset
The VQAv2 dataset, containing open-ended questions on 265k images, with 5.4 questions per image on average. -
Waymo Open Perception
The Waymo Open Perception dataset is a large-scale dataset for autonomous driving perception. -
Multimodal Attribute Extraction (MAE) dataset
The Multimodal Attribute Extraction (MAE) dataset is a large dataset containing mixed-media data for over 2.2 million commercial product items, collected from a large number of... -
Visual Genome Relationship Dataset
The Visual Genome Relationship Dataset contains 108,077 images and 1,531,448 relationships. -
Visual Relationship Dataset
The Visual Relationship Dataset contains 5000 images with 100 object categories and 70 predicates. -
SemEval-2023 Task 1: Visual Word Sense Disambiguation
The SemEval-2023 Visual Word Sense Disambiguation (V-WSD) Task dataset consists of a silver dataset with 12,869 V-WSD instances. Each sample is a 4-tuple ⟨f, c, I, i∗ ∈ I⟩ where... -
Geometrical Illusions Dataset
The dataset is a collection of images used to study Geometrical illusions. -
Visual Cortex Dataset
The dataset is a collection of images used to study the visual cortex. -
Hyperspectral pasture image dataset
Hyperspectral pasture image dataset with imbalanced class distributions and disparate volumes of data among different sites