Dataset - LDM

Text2Shape

Text2Shape is a dataset of 8,447 table instances and 6,591 chair instances from the ShapeNet dataset, along with 75,344 natural language descriptions.
- Dataset
- JSON
Text8

Word2Vec is a distributed word embedding generator that uses an artificial neural network to learn dense vector representations of words.
- Dataset
- JSON
NLVR2

The dataset used in the paper is a set of sequential vision-and-language tasks, where each task consists of an image and a text input.
- Dataset
- JSON
Penn Treebank

The Penn Treebank dataset contains one million words of 1989 Wall Street Journal material annotated in Treebank II style, with 42k sentences of varying lengths.
- Dataset
- JSON
LXMERT

The LXMERT dataset is used for visual question answering task. It uses pre-trained weights provided by Tan and Bansal (2019) and fine-tunes it with adaptive approaches mentioned...
- Dataset
- JSON
VQA 2.0

The VQA 2.0 dataset is used for visual question answering task. It consists of three sets with a train set containing 83k images and 444k questions, a validation set containing...
- Dataset
- JSON
MSR-VTT

The dataset used in the paper is MSR-VTT, a large video description dataset for bridging video and language. The dataset contains 10k video clips with length varying from 10 to...
- Dataset
- JSON
LAION

The dataset used in the paper is not explicitly described, but it is mentioned that it is a large-scale captioned image dataset (LAION) used to train the Stable Diffusion model.
- Dataset
- JSON
MS-COCO

Large scale datasets [18, 17, 27, 6] boosted text conditional image generation quality. However, in some domains it could be difficult to make such datasets and usually it could...
- Dataset
- JSON
FFHQ

Large scale datasets [18, 17, 27, 6] boosted text conditional image generation quality. However, in some domains it could be difficult to make such datasets and usually it could...
- Dataset
- JSON
COCO

Large scale datasets [18, 17, 27, 6] boosted text conditional image generation quality. However, in some domains it could be difficult to make such datasets and usually it could...
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

31 datasets found