1 dataset found

Formats: JSON Tags: multimodal input

Filter Results
  • SNLI-VE

    The dataset used in the paper is a set of sequential vision-and-language tasks, where each task consists of an image and a text input.
You can also access this registry using the API (see API Docs).