21 datasets found

Groups: Visual Question Answering Organizations: No Organization Formats: JSON

Filter Results
  • CLEVR-Humans

    The CLEVR-Humans dataset consists of 32,164 questions asked by humans, containing words and reasoning steps that were unseen in CLEVR.
  • VQA-CP

    The VQA-CP dataset is a split of the VQA dataset, designed to test generalization skills across changes in the answer distribution between the training and the test sets.
  • GQA-OOD: Out-of-Domain VQA Benchmark

    GQA-OOD is a benchmark dedicated to the out-of-domain VQA evaluation.
  • GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question...

    GQA is a new dataset for real-world visual reasoning and compositional question answering.
  • OK-VQA

    The OK-VQA dataset is a visual question answering benchmark requiring external knowledge.
  • FVQA

    FVQA is a fact-based visual question answering dataset, containing 2190 images and 5826 (question, answer) pairs, with supporting facts selected from knowledge bases.
  • Visual7W

    The Visual7W dataset for Visual Question Answering (VQA). The dataset contains 7,000 images with 7,000 queries.
  • VQA 1.0

    The VQA 1.0 dataset is a large-scale dataset for visual question answering, containing 15,000 images with 50,000 questions.
  • VQA

    The VQA dataset is a large-scale visual question answering dataset that consists of pairs of images that require natural language answers.
  • MovieQA, TVQA, AVSD, EQA, Embodied QA

    A collection of datasets for visual question answering, including MovieQA, TVQA, AVSD, EQA, and Embodied QA.
  • VQA v2.0

    We use the VQA v2.0 dataset for the evaluation of our proposed joint model, where the answers are balanced in order to minimize the effectiveness of learning dataset priors.
  • GQA

    The GQA dataset is a visual question answering dataset that characterizes in compositional question answering and visual reasoning about real-world images.
  • TGIF-QA

    The TGIF-QA dataset consists of 165165 QA pairs chosen from 71741 animated GIFs. To evaluate the spatiotemporal reasoning ability at the video level, TGIF-QA dataset designs...
  • VQAv2

    Visual Question Answering (VQA) has achieved great success thanks to the fast development of deep neural networks (DNN). On the other hand, the data augmentation, as one of the...
  • Conceptual Captions

    The dataset used in the paper "Scaling Laws of Synthetic Images for Model Training". The dataset is used for supervised image classification and zero-shot classification tasks.
  • Measuring Machine Intelligence through Visual Question Answering

    Measuring machine intelligence through visual question answering.
  • VQA: Visual Question Answering

    Visual Question Answering (VQA) has emerged as a prominent multi-discipline research problem in both academia and industry.
  • Hierarchical Question-Image Co-Attention for Visual Question Answering

    A number of recent works have proposed attention models for Visual Question Answering (VQA) that generate spatial maps highlighting image regions relevant to answering the...
  • CLEVR

    CLEVR images contain objects characterized by a set of attributes (shape, color, size and material). The questions are grouped into 5 categories: Exist, Count, CompareInteger,...
  • Visual Genome

    The Visual Genome dataset is a large-scale visual question answering dataset, containing 1.5 million images, each with 15-30 annotated entities, attributes, and relationships.