No Organization - Organizations

Conceptual 12m

Conceptual 12m dataset for automatic image captioning

Dataset
JSON

LLaVA-1.5

The dataset used in this paper is a multimodal large language model (LLaMA) dataset, specifically LLaVA-1.5, which consists of 7 billion parameters and is used for multimodal...

Dataset
JSON

MSR-VTT

The dataset used in the paper is MSR-VTT, a large video description dataset for bridging video and language. The dataset contains 10k video clips with length varying from 10 to...

Dataset
JSON

VoxCeleb

Speaker verification systems experience significant performance degradation when tasked with short-duration trial recordings. To address this challenge, a multi-scale feature...

Dataset
JSON

Youtube-8M

Youtube-8M is a large-scale video classification benchmark.

Dataset
JSON

Video Captioning Dataset

A video captioning dataset generated by pseudolabeling videos with image captioning models.

Dataset
JSON

MNIST-SVHN-Text dataset

The MNIST-SVHN-Text dataset is a multi-modal dataset consisting of images, text, and labels.

Dataset
JSON

COCO

Large scale datasets [18, 17, 27, 6] boosted text conditional image generation quality. However, in some domains it could be difficult to make such datasets and usually it could...

Dataset
JSON

MSCOCO

Human Pose Estimation (HPE) aims to estimate the position of each joint point of the human body in a given image. HPE tasks support a wide range of downstream tasks such as...

Dataset
JSON

89 datasets found