-
VQAv2 dataset
The VQAv2 dataset, containing open-ended questions on 265k images, with 5.4 questions per image on average. -
CLEVR-Humans
The CLEVR-Humans dataset consists of 32,164 questions asked by humans, containing words and reasoning steps that were unseen in CLEVR. -
Image Captioning and Visual Question Answering
The dataset is used for image captioning and visual question answering. -
LLaVA-Instruct-150k
Visual question answering dataset -
GQA-OOD: Out-of-Domain VQA Benchmark
GQA-OOD is a benchmark dedicated to the out-of-domain VQA evaluation. -
GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question...
GQA is a new dataset for real-world visual reasoning and compositional question answering. -
Semantic Equivalent Adversarial Data Augmentation for Visual Question Answering
Visual Question Answering (VQA) has achieved great success thanks to the fast development of deep neural networks (DNN). On the other hand, the data augmentation, as one of the... -
Compressing and Debiasing Vision-Language Pre-Trained Models for Visual Quest...
This paper investigates whether a VLP can be compressed and debiased simultaneously by searching sparse and robust subnetworks.