27 datasets found

Tags: visual question answering

Filter Results
  • SMART-101 dataset

    The dataset for the SMART-101 challenge consists of 101 unique puzzles that require a mix of several elementary skills, including arithmetic, algebra, and spatial reasoning,...
  • High Quality Image Text Pairs

    The High Quality Image Text Pairs (HQITP-134M) dataset consists of 134 million diverse and high-quality images paired with descriptive captions and titles.
  • OK-VQA

    The OK-VQA dataset is a visual question answering benchmark requiring external knowledge.
  • Visual7W

    The Visual7W dataset for Visual Question Answering (VQA). The dataset contains 7,000 images with 7,000 queries.
  • Mutan: Multimodal Tucker Fusion for Visual Question Answering

    The dataset used in the paper is a collection of images and corresponding referring expressions.
  • VQAvs

    VQAvs is a dataset for visual question answering, containing questions answerable using visual images.
  • VQA-CPv1 and VQA-CPv2

    VQA-CPv1 and VQA-CPv2 are datasets for visual question answering, containing questions answerable using visual images.
  • Object Attribute Matters in Visual Question Answering

    Visual question answering is a multimodal task that requires the joint comprehension of visual and textual information. The proposed approach utilizes object attributes to...
  • SpatialSense

    A dataset for visual spatial relationship classification (VSRC) with nine well-defined spatial relations.
  • Winoground

    The Winoground dataset consists of 400 items, each containing two image-caption pairs (I0, C0), (I1, C1).
  • VQA

    The VQA dataset is a large-scale visual question answering dataset that consists of pairs of images that require natural language answers.
  • MovieQA, TVQA, AVSD, EQA, Embodied QA

    A collection of datasets for visual question answering, including MovieQA, TVQA, AVSD, EQA, and Embodied QA.
  • VQA v2.0

    We use the VQA v2.0 dataset for the evaluation of our proposed joint model, where the answers are balanced in order to minimize the effectiveness of learning dataset priors.
  • GQA

    The GQA dataset is a visual question answering dataset that characterizes in compositional question answering and visual reasoning about real-world images.
  • Conceptual Captions 12M

    The Conceptual Captions 12M (CC-12M) dataset consists of 12 million diverse and high-quality images paired with descriptive captions and titles.
  • Sort-of-CLEVR

    The dataset used in the paper is Sort-of-CLEVR, a visual question answering dataset.
  • CLEVR dataset

    The CLEVR dataset is a dataset for visual question answering, where each image is annotated with a question.
  • Visual7W dataset

    The Visual7W dataset is a visual question answering dataset, which consists of images and corresponding questions.
  • VQAv2

    Visual Question Answering (VQA) has achieved great success thanks to the fast development of deep neural networks (DNN). On the other hand, the data augmentation, as one of the...
  • Conceptual Captions

    The dataset used in the paper "Scaling Laws of Synthetic Images for Model Training". The dataset is used for supervised image classification and zero-shot classification tasks.