-
ZRIGF: An Innovative Multimodal Framework for Zero-Resource Image-Grounded Di...
Image-grounded dialogue generation in zero-resource scenarios -
Mutan: Multimodal Tucker Fusion for Visual Question Answering
The dataset used in the paper is a collection of images and corresponding referring expressions. -
Multimodal Information Fusion for Urban Scene Understanding
A dataset for urban scene understanding.