Image-Text Matching - Groups

Flickr30K and MSCOCO

The dataset used in the paper is Flickr30K and MSCOCO, which are used for image-text matching and image captioning tasks.

Dataset
JSON

Stacked Cross Attention

The dataset used in the paper is Stacked Cross Attention for Image-Text Matching.

Dataset
JSON

Language Models with Image Descriptors

The Language Models with Image Descriptors dataset, which is used for evaluating the performance of the InstructVid2Vid model.

Dataset
JSON

SpatialSense

A dataset for visual spatial relationship classification (VSRC) with nine well-defined spatial relations.

Dataset
JSON

ALADIN

The ALADIN dataset is a custom dataset created for the ALADIN paper.

Dataset
JSON

Compositional Visual Genome

The Compositional Visual Genome (ComVG) dataset is a reconstructed dataset of the Visual Genome (Krishna et al., 2017) dataset, containing 108,007 images annotated with 2.3...

Dataset
JSON

RefCOCOg

The RefCOCOg dataset is a reconstructed dataset of the MS-COCO dataset, containing 85,474 referring expressions for 54,822 objects in 26,711 images.

Dataset
JSON

Flickr30k

The Flickr30k dataset is widely utilized for image caption and image-text retrieval tasks, providing a substantial collection of images with associated captions.

Dataset
JSON

Visual Genome

The Visual Genome dataset is a large-scale visual question answering dataset, containing 1.5 million images, each with 15-30 annotated entities, attributes, and relationships.

Dataset
JSON

MS-COCO

Large scale datasets [18, 17, 27, 6] boosted text conditional image generation quality. However, in some domains it could be difficult to make such datasets and usually it could...

Dataset
JSON

MSCOCO

Human Pose Estimation (HPE) aims to estimate the position of each joint point of the human body in a given image. HPE tasks support a wide range of downstream tasks such as...

Dataset
JSON

11 datasets found