The Birds-to-Words dataset contains 15,931 images (12,770 training and 3,151 testing) tagged with descriptions of fine-grained differences between pairwise bird images.
The FashionIQ dataset contains images of fashion products over 3 categories: Dress, Toptee, and Shirt, with 46,609 images in the training and 31,075 images in the validation set.
The dataset used in the paper is a multi-view clustering dataset, which contains 6 views of 30000 samples each. The dataset is used to evaluate the performance of the proposed...
The Flickr30k dataset is widely utilized for image caption and image-text retrieval tasks, providing a substantial collection of images with associated captions.
The dataset used in the paper is YFCC100M, a large-scale video dataset. The dataset is used for foreground and background patch extraction and object recognition tasks.
Large scale datasets [18, 17, 27, 6] boosted text conditional image generation quality. However, in some domains it could be difficult to make such datasets and usually it could...