68 datasets found

Filter Results
  • VQA-CP v2

    This paper proposes VQA-CP v2, a standard OOD benchmark in VQA.
  • Compressing and Debiasing Vision-Language Pre-Trained Models for Visual Quest...

    This paper investigates whether a VLP can be compressed and debiased simultaneously by searching sparse and robust subnetworks.
  • Conceptual Captions 12M

    The Conceptual Captions 12M (CC-12M) dataset consists of 12 million diverse and high-quality images paired with descriptive captions and titles.
  • Sort-of-CLEVR

    The dataset used in the paper is Sort-of-CLEVR, a visual question answering dataset.
  • VQA-CP v2 and VQA 2.0

    The dataset used in the paper is VQA-CP v2 and VQA 2.0, which are two standard datasets for visual question answering.
  • Meta-VQA

    The Meta-VQA dataset is a modification of the VQA v2.0 dataset for Visual-Question-Answering, composed of 1234 unique tasks (questions), split into 870 training tasks and 373...
  • CLEVR dataset

    The CLEVR dataset is a dataset for visual question answering, where each image is annotated with a question.
  • Visual7W dataset

    The Visual7W dataset is a visual question answering dataset, which consists of images and corresponding questions.
  • VQAv2

    Visual Question Answering (VQA) has achieved great success thanks to the fast development of deep neural networks (DNN). On the other hand, the data augmentation, as one of the...
  • Extended RSVQAxBEN

    The extended RSVQAxBEN dataset is an extension of the RSVQAxBEN dataset, including all the spectral bands of Sentinel-2 images with 10m and 20m spatial resolution.
  • RSVQAxBEN

    The RSVQAxBEN dataset is a large-scale benchmark dataset for remote sensing visual question answering, based on the BigEarthNet (BEN) archive and containing 590,326 Sentinel-2...
  • RSVQA-LR

    The RSVQA-LR dataset is a large-scale benchmark dataset for remote sensing visual question answering, constructed using 7 Sentinel-2 tiles acquired over the Netherlands, from...
  • Conceptual Captions

    The dataset used in the paper "Scaling Laws of Synthetic Images for Model Training". The dataset is used for supervised image classification and zero-shot classification tasks.
  • NLVR2

    The dataset used in the paper is a set of sequential vision-and-language tasks, where each task consists of an image and a text input.
  • Measuring Machine Intelligence through Visual Question Answering

    Measuring machine intelligence through visual question answering.
  • VQA: Visual Question Answering

    Visual Question Answering (VQA) has emerged as a prominent multi-discipline research problem in both academia and industry.
  • Hierarchical Question-Image Co-Attention for Visual Question Answering

    A number of recent works have proposed attention models for Visual Question Answering (VQA) that generate spatial maps highlighting image regions relevant to answering the...
  • LXMERT

    The LXMERT dataset is used for visual question answering task. It uses pre-trained weights provided by Tan and Bansal (2019) and fine-tunes it with adaptive approaches mentioned...
  • VQA 2.0

    The VQA 2.0 dataset is used for visual question answering task. It consists of three sets with a train set containing 83k images and 444k questions, a validation set containing...
  • LLaVA-1.5

    The dataset used in this paper is a multimodal large language model (LLaMA) dataset, specifically LLaVA-1.5, which consists of 7 billion parameters and is used for multimodal...