You're currently viewing an old version of this dataset. To see the current version, click here.

Visual Question Answering (VQA)

The VQA dataset consists of 248,349 training questions, 121,512 validation questions and 244,302 testing questions, generated on a total of 123,287 images.

Data and Resources

Cite this as

Antol et al., Zhou et al., Noh et al., Ren et al., Li et al., Kim et al., Xu et al., Yang et al., Ye et al., Ben-Younes et al., Ilievski et al., Wu et al., Xie et al., Kiros et al., He et al., Shih et al., Wang et al., Zitnick et al. (2025). Dataset: Visual Question Answering (VQA). https://doi.org/10.57702/qco2vve1

DOI retrieved: January 3, 2025

Additional Info

Field Value
Created January 3, 2025
Last update January 3, 2025
Defined In https://doi.org/10.48550/arXiv.1711.06794
Author Antol et al.
More Authors
Zhou et al.
Noh et al.
Ren et al.
Li et al.
Kim et al.
Xu et al.
Yang et al.
Ye et al.
Ben-Younes et al.
Ilievski et al.
Wu et al.
Xie et al.
Kiros et al.
He et al.
Shih et al.
Wang et al.
Zitnick et al.
Homepage https://www.aaai.org/Conferences/AAAI/2015/