-
Room-to-Room (R2R) dataset
The Room-to-Room (R2R) dataset is a benchmark for vision-and-language navigation tasks. It consists of 7,189 paths sampled from its navigation graphs, each with three... -
Playing Lottery Tickets with Vision and Language
Large-scale pre-training has recently revolutionized vision-and-language (VL) research. Models such as LXMERT and UNITER have achieved state-of-the-art performance across a wide...