Dataset - LDM

ClipMD

Medical image-text matching tasks
- Dataset
- JSON
Meta-VQA

The Meta-VQA dataset is a modification of the VQA v2.0 dataset for Visual-Question-Answering, composed of 1234 unique tasks (questions), split into 870 training tasks and 373...
- Dataset
- JSON
RefCOCO, RefCOCO+, and RefCOCOg

Visual Grounding is a task that aims to locate a target object according to a natural language expression. The dataset used in this paper is RefCOCO, RefCOCO+, and RefCOCOg.
- Dataset
- JSON
Visual Genome

The Visual Genome dataset is a large-scale visual question answering dataset, containing 1.5 million images, each with 15-30 annotated entities, attributes, and relationships.
- Dataset
- JSON
MSCOCO

Human Pose Estimation (HPE) aims to estimate the position of each joint point of the human body in a given image. HPE tasks support a wide range of downstream tasks such as...
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

5 datasets found