43 datasets found

Tags: Multimodal

Filter Results
  • VQA 2.0

    The VQA 2.0 dataset is used for visual question answering task. It consists of three sets with a train set containing 83k images and 444k questions, a validation set containing...
  • CREMA-D

    The CREMA-D dataset is an audio-visual dataset for emotion recognition task, each video in which consists of both facial and acoustic emotional expressions.
  • MIMIC-IV

    The dataset used in the paper is a healthcare dataset containing patient information, including vital signs, lab values, and medication administration. The dataset is used to...
You can also access this registry using the API (see API Docs).