Mutan: Multimodal Tucker Fusion for Visual Question Answering

The dataset used in the paper is a collection of images and corresponding referring expressions.

Data and Resources

Cite this as

Hedi Ben-Younes, R´emi Cadene, Matthieu Cord, Nicolas Thome (2024). Dataset: Mutan: Multimodal Tucker Fusion for Visual Question Answering. https://doi.org/10.57702/rzhvbow6

DOI retrieved: December 16, 2024

Additional Info

Field Value
Created December 16, 2024
Last update December 16, 2024
Defined In https://doi.org/10.48550/arXiv.2105.07175
Author Hedi Ben-Younes
More Authors
R´emi Cadene
Matthieu Cord
Nicolas Thome
Homepage https://arxiv.org/abs/1706.05587