How2: A large-scale dataset for multimodal language understanding

A large-scale multimodal machine translation dataset named How2, which has 1.57 times longer mean sentence length than Multi30k and no repetition.

BibTex: