Wikipedia Image Text

Wikipedia Image Text (WIT) dataset is a large-scale multimodal learning dataset used for training and evaluation of the MURAL model.

BibTex: