Dataset - LDM

RefCOCO+ and RefCOCOg

The RefCOCO+ and RefCOCOg datasets are benchmarks for referring expression comprehension. They contain images of objects and natural language descriptions of the objects.
- Dataset
- JSON
Talk2Car

The Talk2Car dataset is a benchmark for language grounding for autonomous vehicles. It contains images of urban scenes and natural language commands referring to objects in the...
- Dataset
- JSON
Flickr30K and MSCOCO

The dataset used in the paper is Flickr30K and MSCOCO, which are used for image-text matching and image captioning tasks.
- Dataset
- JSON
Stacked Cross Attention

The dataset used in the paper is Stacked Cross Attention for Image-Text Matching.
- Dataset
- JSON
ReasonSeg

The ReasonSeg dataset is a benchmark for reasoning segmentation tasks, which demands a nuanced comprehension of intricate queries to accurately pinpoint object regions.
- Dataset
- JSON
A leaderboard dataset for zero-shot referring expression comprehension

The dataset used in the paper for zero-shot referring expression comprehension task
- Dataset
- JSON
Language Models with Image Descriptors

The Language Models with Image Descriptors dataset, which is used for evaluating the performance of the InstructVid2Vid model.
- Dataset
- JSON
SpatialSense

A dataset for visual spatial relationship classification (VSRC) with nine well-defined spatial relations.
- Dataset
- JSON
Flickr30K Entities

The Flickr30K Entities dataset consists of 31,783 images each matched with 5 captions. The dataset links distinct sentence entities to image bounding boxes, resulting in 70K...
- Dataset
- JSON
ALADIN

The ALADIN dataset is a custom dataset created for the ALADIN paper.
- Dataset
- JSON
VQAv2

Visual Question Answering (VQA) has achieved great success thanks to the fast development of deep neural networks (DNN). On the other hand, the data augmentation, as one of the...
- Dataset
- JSON
Compositional Visual Genome

The Compositional Visual Genome (ComVG) dataset is a reconstructed dataset of the Visual Genome (Krishna et al., 2017) dataset, containing 108,007 images annotated with 2.3...
- Dataset
- JSON
RefCOCOg

The RefCOCOg dataset is a reconstructed dataset of the MS-COCO dataset, containing 85,474 referring expressions for 54,822 objects in 26,711 images.
- Dataset
- JSON
Flickr30k

The Flickr30k dataset is widely utilized for image caption and image-text retrieval tasks, providing a substantial collection of images with associated captions.
- Dataset
- JSON
MS-COCO

Large scale datasets [18, 17, 27, 6] boosted text conditional image generation quality. However, in some domains it could be difficult to make such datasets and usually it could...
- Dataset
- JSON
MSCOCO

Human Pose Estimation (HPE) aims to estimate the position of each joint point of the human body in a given image. HPE tasks support a wide range of downstream tasks such as...
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

16 datasets found