29 datasets found

Tags: Text

Filter Results
  • NLVR2

    The dataset used in the paper is a set of sequential vision-and-language tasks, where each task consists of an image and a text input.
  • Penn Treebank

    The Penn Treebank dataset contains one million words of 1989 Wall Street Journal material annotated in Treebank II style, with 42k sentences of varying lengths.
  • LXMERT

    The LXMERT dataset is used for visual question answering task. It uses pre-trained weights provided by Tan and Bansal (2019) and fine-tunes it with adaptive approaches mentioned...
  • VQA 2.0

    The VQA 2.0 dataset is used for visual question answering task. It consists of three sets with a train set containing 83k images and 444k questions, a validation set containing...
  • MSR-VTT

    The dataset used in the paper is MSR-VTT, a large video description dataset for bridging video and language. The dataset contains 10k video clips with length varying from 10 to...
  • LAION

    The dataset used in the paper is not explicitly described, but it is mentioned that it is a large-scale captioned image dataset (LAION) used to train the Stable Diffusion model.
  • MS-COCO

    Large scale datasets [18, 17, 27, 6] boosted text conditional image generation quality. However, in some domains it could be difficult to make such datasets and usually it could...
  • FFHQ

    Large scale datasets [18, 17, 27, 6] boosted text conditional image generation quality. However, in some domains it could be difficult to make such datasets and usually it could...
  • COCO

    Large scale datasets [18, 17, 27, 6] boosted text conditional image generation quality. However, in some domains it could be difficult to make such datasets and usually it could...
You can also access this registry using the API (see API Docs).