Dataset - LDM

VILLA

The dataset used in the paper for vision-and-language representation learning.
- Dataset
- JSON
Room-to-Room (R2R) dataset

The Room-to-Room (R2R) dataset is a benchmark for vision-and-language navigation tasks. It consists of 7,189 paths sampled from its navigation graphs, each with three...
- Dataset
- JSON
Playing Lottery Tickets with Vision and Language

Large-scale pre-training has recently revolutionized vision-and-language (VL) research. Models such as LXMERT and UNITER have achieved state-of-the-art performance across a wide...
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

3 datasets found