Visual Question Answering (VQA)

doi:doi:10.57702/qco2vve1

You're currently viewing an old version of this dataset. To see the current version, click here.

Visual Question Answering (VQA)

The VQA dataset consists of 248,349 training questions, 121,512 validation questions and 244,302 testing questions, generated on a total of 123,287 images.

Data and Resources

Original MetadataJSON
The json representation of the dataset with its distributions based on DCAT.
Explore
- Preview
- Download

Cite this as

Antol et al., Zhou et al., Noh et al., Ren et al., Li et al., Kim et al., Xu et al., Yang et al., Ye et al., Ben-Younes et al., Ilievski et al., Wu et al., Xie et al., Kiros et al., He et al., Shih et al., Wang et al., Zitnick et al. (2025). Dataset: Visual Question Answering (VQA). https://doi.org/10.57702/qco2vve1

DOI retrieved: January 3, 2025

Additional Info

Field	Value
Created	January 3, 2025
Last update	January 3, 2025
Defined In	https://doi.org/10.48550/arXiv.1711.06794
Author	Antol et al.
More Authors	Zhou et al. Noh et al. Ren et al. Li et al. Kim et al. Xu et al. Yang et al. Ye et al. Ben-Younes et al. Ilievski et al. Wu et al. Xie et al. Kiros et al. He et al. Shih et al. Wang et al. Zitnick et al.
Homepage	https://www.aaai.org/Conferences/AAAI/2015/