-
Multi30k dataset for multimodal machine translation
A multimodal machine translation dataset, Multi30k, which has short, simple and repetitive sentences. -
How2: A large-scale dataset for multimodal language understanding
A large-scale multimodal machine translation dataset named How2, which has 1.57 times longer mean sentence length than Multi30k and no repetition. -
VisualGenome datasets
The VisualGenome datasets containing Bengali, Hindi, and Malayalam sentences for fine-tuning. -
LIUM-CVC Submissions for WMT18 Multimodal Translation Task
Multimodal Neural Machine Translation systems developed by LIUM and CVC for WMT18 Shared Task on Multimodal Translation.