14 datasets found

Tags: visual question answering

Filter Results
  • OK-VQA

    The OK-VQA dataset is a visual question answering benchmark requiring external knowledge.
  • Visual7W

    The Visual7W dataset for Visual Question Answering (VQA). The dataset contains 7,000 images with 7,000 queries.
  • Visual Question Answering as Reading Comprehension

    Visual question answering (VQA) demands simultaneous comprehension of both the image visual content and natural language questions. In some cases, the reasoning needs the help...
  • Multimodal Visual Patterns (MMVP) Benchmark

    The Multimodal Visual Patterns (MMVP) benchmark is a dataset used to evaluate the visual question answering capabilities of multimodal large language models (MLLMs).
  • VQA

    The VQA dataset is a large-scale visual question answering dataset that consists of pairs of images that require natural language answers.
  • MovieQA, TVQA, AVSD, EQA, Embodied QA

    A collection of datasets for visual question answering, including MovieQA, TVQA, AVSD, EQA, and Embodied QA.
  • VQA v2.0

    We use the VQA v2.0 dataset for the evaluation of our proposed joint model, where the answers are balanced in order to minimize the effectiveness of learning dataset priors.
  • GQA

    The GQA dataset is a visual question answering dataset that characterizes in compositional question answering and visual reasoning about real-world images.
  • VQAv2

    Visual Question Answering (VQA) has achieved great success thanks to the fast development of deep neural networks (DNN). On the other hand, the data augmentation, as one of the...
  • Conceptual Captions

    The dataset used in the paper "Scaling Laws of Synthetic Images for Model Training". The dataset is used for supervised image classification and zero-shot classification tasks.
  • Florence

    A large-scale dataset for visual question answering.
  • CLEVR

    CLEVR images contain objects characterized by a set of attributes (shape, color, size and material). The questions are grouped into 5 categories: Exist, Count, CompareInteger,...
  • Visual Genome

    The Visual Genome dataset is a large-scale visual question answering dataset, containing 1.5 million images, each with 15-30 annotated entities, attributes, and relationships.
  • COCO-QA

    The COCO-QA dataset is used for visual question answering task. It consists of 123,287 images and 78,736 train and 38,948 test questions.