Mutan: Multimodal Tucker Fusion for Visual Question Answering

The dataset used in the paper is a collection of images and corresponding referring expressions.

BibTex: