Image-Text Matching - Groups

Flickr30K and MSCOCO

The dataset used in the paper is Flickr30K and MSCOCO, which are used for image-text matching and image captioning tasks.

Dataset
JSON

Flickr30k

The Flickr30k dataset is widely utilized for image caption and image-text retrieval tasks, providing a substantial collection of images with associated captions.

Dataset
JSON

Visual Genome

The Visual Genome dataset is a large-scale visual question answering dataset, containing 1.5 million images, each with 15-30 annotated entities, attributes, and relationships.

Dataset
JSON

MS-COCO

Large scale datasets [18, 17, 27, 6] boosted text conditional image generation quality. However, in some domains it could be difficult to make such datasets and usually it could...

Dataset
JSON

MSCOCO

Human Pose Estimation (HPE) aims to estimate the position of each joint point of the human body in a given image. HPE tasks support a wide range of downstream tasks such as...

Dataset
JSON

5 datasets found

Flickr30K and MSCOCO

Flickr30k

Visual Genome

MS-COCO

MSCOCO