Image-Text Retrieval - Groups

LexLIP: Lexicon-Bottlenecked Language-Image Pre-Training for Large-Scale Imag...

Image-text retrieval (ITR) is a task to retrieve the relevant images/texts, given the query from another modality. The conventional dense retrieval paradigm relies on encoding...

Dataset
JSON

CC14M

Large-scale image-text dataset for pre-training a collaborative two-stream vision-language model for cross-modal retrieval.

Dataset
JSON

CC4M

Large-scale image-text datasets for pre-training a collaborative two-stream vision-language model for cross-modal retrieval.

Dataset
JSON

XmediaNet

The XmediaNet dataset is a large-scale image-text dataset for cross-modal retrieval.

Dataset
JSON

Pascal-Sentence

The Pascal-Sentence dataset contains image-text pairs for cross-modal retrieval.

Dataset
JSON

NUS-WIDE dataset

The NUS-WIDE dataset is a large-scale image-text dataset, which is suitable for feature-partitioned collaborative learning. The dataset contains 100,000 images with 1000 text...

Dataset
JSON

Conceptual Captions

The dataset used in the paper "Scaling Laws of Synthetic Images for Model Training". The dataset is used for supervised image classification and zero-shot classification tasks.

Dataset
JSON

Conceptual Captions 3M

The Conceptual Captions 3M dataset is a large-scale image-text dataset used for vision-language pre-training.

Dataset
JSON

Flickr30k

The Flickr30k dataset is widely utilized for image caption and image-text retrieval tasks, providing a substantial collection of images with associated captions.

Dataset
JSON

SBU Captions

The SBU Captions dataset is a large-scale image-text dataset used for vision-language pre-training.

Dataset
JSON

Visual Genome

The Visual Genome dataset is a large-scale visual question answering dataset, containing 1.5 million images, each with 15-30 annotated entities, attributes, and relationships.

Dataset
JSON

MS-COCO

Large scale datasets [18, 17, 27, 6] boosted text conditional image generation quality. However, in some domains it could be difficult to make such datasets and usually it could...

Dataset
JSON

MSCOCO

Human Pose Estimation (HPE) aims to estimate the position of each joint point of the human body in a given image. HPE tasks support a wide range of downstream tasks such as...

Dataset
JSON

13 datasets found