Dataset - LDM

Image-Text Retrieval

The dataset used in the paper for image-text retrieval.
- Dataset
- JSON
Alimama retrieval dataset

The Alimama retrieval dataset is a large-scale dataset covering daily search logs of the three scenarios: Visual Search (VS), Similar Search (SS), and Interest Search (IS) on...
- Dataset
- JSON
Robust04

The dataset used in the paper is the Robust04 dataset, a news corpus containing 0.5M documents and 249 queries.
- Dataset
- JSON
MSR-VTT-CN

Bilingual video-text retrieval dataset
- Dataset
- JSON
Cross-Lingual Cross-Modal Retrieval with Noise-Robust Learning

Cross-lingual cross-modal retrieval with noise-robust learning for low-resource languages
- Dataset
- JSON
Stickers Dataset

The image-only stickers dataset used for testing the kNN-Diffusion model.
- Dataset
- JSON
Public Multimodal Dataset

The dataset used for training the kNN-Diffusion model, which consists of a large-scale retrieval method for training a text-to-image model without any text data.
- Dataset
- JSON
AudioCaps

Audio-text retrieval aims at retrieving a target audio clip or caption from a pool of candidates given a query in another modality.
- Dataset
- JSON
LSMDC

The LSMDC movie description dataset consists of 118,081 short video clips extracted from 202 movies, each annotated with a single caption.
- Dataset
- JSON
DeepFashion dataset

The DeepFashion dataset is a large-scale dataset for person image synthesis, containing 101,966 pairs of images with different poses and clothing.
- Dataset
- JSON
MSVD

Text-Video Retrieval (TVR) aims to align relevant video content with natural language queries. To date, most state-of-the-art TVR methods learn image-to-video transfer learning...
- Dataset
- JSON
ActivityNet Captions

The ActivityNet Captions is a benchmark dataset proposed for dense video captioning. There are 20K untrimmed videos in total, and each video has several annotated segments with...
- Dataset
- JSON
MSR-VTT

The dataset used in the paper is MSR-VTT, a large video description dataset for bridging video and language. The dataset contains 10k video clips with length varying from 10 to...
- Dataset
- JSON
COCO

Large scale datasets [18, 17, 27, 6] boosted text conditional image generation quality. However, in some domains it could be difficult to make such datasets and usually it could...
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

14 datasets found