Visual Question Answering - Groups

Image Captioning and Visual Question Answering

The dataset is used for image captioning and visual question answering.

Dataset
JSON

VQA 1.0

The VQA 1.0 dataset is a large-scale dataset for visual question answering, containing 15,000 images with 50,000 questions.

Dataset
JSON

Conceptual Captions 12M

The Conceptual Captions 12M (CC-12M) dataset consists of 12 million diverse and high-quality images paired with descriptive captions and titles.

Dataset
JSON

VQAv2

Visual Question Answering (VQA) has achieved great success thanks to the fast development of deep neural networks (DNN). On the other hand, the data augmentation, as one of the...

Dataset
JSON

Conceptual Captions

The dataset used in the paper "Scaling Laws of Synthetic Images for Model Training". The dataset is used for supervised image classification and zero-shot classification tasks.

Dataset
JSON

LLaVA-1.5

The dataset used in this paper is a multimodal large language model (LLaMA) dataset, specifically LLaVA-1.5, which consists of 7 billion parameters and is used for multimodal...

Dataset
JSON

MS-COCO

Large scale datasets [18, 17, 27, 6] boosted text conditional image generation quality. However, in some domains it could be difficult to make such datasets and usually it could...

Dataset
JSON

Microsoft COCO

The Microsoft COCO dataset was used for training and evaluating the CNNs because it has become a standard benchmark for testing algorithms aimed at scene understanding and...

Dataset
JSON

MSCOCO

Human Pose Estimation (HPE) aims to estimate the position of each joint point of the human body in a given image. HPE tasks support a wide range of downstream tasks such as...

Dataset
JSON

9 datasets found