43 datasets found

Tags: Multimodal

Filter Results
  • Multimodal Learning Task

    The dataset used in the paper is a multimodal learning task for robots.
  • Multimodal Categorization Task

    The dataset used in the paper is a multimodal categorization task using image data and speech signals.
  • DEAP

    A large-scale city-wise dataset for exploring the relationships among air pollutants and their causal agents over time.
  • MAMI dataset

    The MAMI dataset is a collection of images and text posts used for training and testing the proposed multimodal model for misogyny identification.
  • XD-Violence

    The XD-Violence dataset is a large-scale multimodal video dataset for violence detection. It consists of 4,754 untrimmed videos with a total duration of 217 hours, covering six...
  • Multi-scale Bottleneck Transformer for Weakly Supervised Multimodal Violence ...

    Weakly supervised multimodal violence detection aims to learn a violence detection model by leveraging multiple modalities such as RGB, optical flow, and audio, while only...
  • VQA

    The VQA dataset is a large-scale visual question answering dataset that consists of pairs of images that require natural language answers.
  • DeepFashion Multimodal dataset

    DeepFashion Multimodal dataset contains 12701 full-body images in 24 categories
  • BraTS 2020 Challenge

    The BraTS 2020 challenge dataset is a multimodal MRI brain tumor segmentation dataset. It contains 369 subjects with 4 MRI modalities (T2 weighted FLAIR, T1 weighted, T1...
  • BraTS 2020

    Automatic segmentation of brain tumors is an essential but challenging step for extracting quantitative imaging biomarkers for accurate tumor detection, diagnosis, prognosis,...
  • Stanford Drone Dataset (SDD)

    The Stanford Drone Dataset (SDD) is a large-scale dataset that consists of 60 aerial-view videos captured by drones over Stanford University. SDD contains positions of more than...
  • MMAUD

    Multimodal anti-UAV dataset for modern miniature drone threats
  • MIMIC-CXR-JPG

    MIMIC-CXR-JPG dataset comprises 227,835 imaging studies conducted on 64,588 patients who sought treatment at the BIDMC Emergency Department from 2011 to 2016.
  • LADI-VTON

    LADI-VTON: Latent diffusion textual-inversion enhanced model for virtual try-on
  • Multimodal Garment Designer

    Multimodal Garment Designer: Human-Centric Latent Diffusion Models for Fashion Image Editing
  • FashionSD-X

    FashionSD-X: Multimodal Fashion Garment Synthesis using Latent Diffusion
  • RAVDESS

    RAVDESS (Ryerson Audio-Visual Database of Emotional Speech and Song) dataset contains 24 professional actors (12 female, 12 male) to offer the performance with good quality and...
  • Stanford Alpaca

    The dataset used in the paper is not explicitly described, but it is mentioned that the authors used CIFAR-10 and CIFAR-100 datasets for image classification, and ImageNet-100...
  • MineCLIP

    The MineCLIP dataset is a large-scale dataset of Minecraft demonstrations.
  • GenRL

    The dataset used in the paper is not explicitly described, but it is mentioned that the authors used a combination of reinforcement learning and generative models to solve...
You can also access this registry using the API (see API Docs).