5 datasets found

Tags: Image-Text Matching

Filter Results
  • ClipMD

    Medical image-text matching tasks
  • Meta-VQA

    The Meta-VQA dataset is a modification of the VQA v2.0 dataset for Visual-Question-Answering, composed of 1234 unique tasks (questions), split into 870 training tasks and 373...
  • RefCOCO, RefCOCO+, and RefCOCOg

    Visual Grounding is a task that aims to locate a target object according to a natural language expression. The dataset used in this paper is RefCOCO, RefCOCO+, and RefCOCOg.
  • Visual Genome

    The Visual Genome dataset is a large-scale visual question answering dataset, containing 1.5 million images, each with 15-30 annotated entities, attributes, and relationships.
  • MSCOCO

    Human Pose Estimation (HPE) aims to estimate the position of each joint point of the human body in a given image. HPE tasks support a wide range of downstream tasks such as...
You can also access this registry using the API (see API Docs).