Cross-modal Reasoning - Groups - LDM

SNLI-VE

The dataset used in the paper is a set of sequential vision-and-language tasks, where each task consists of an image and a text input.
- Dataset
- JSON

Before browse our site, please accept our cookies policy