Multimodal Learning - Groups - LDM

WavCaps

The WavCaps dataset contains chatGPT-assisted weakly-labeled audio captioning data.
- Dataset
- JSON
Multimodal Visual Patterns (MMVP) Benchmark

The Multimodal Visual Patterns (MMVP) benchmark is a dataset used to evaluate the visual question answering capabilities of multimodal large language models (MLLMs).
- Dataset
- JSON
Degree Datasets

Degree datasets are constructed by gradually adjusting the degree of alignment between image and text.
- Dataset
- JSON
Multimodal Learning Task

The dataset used in the paper is a multimodal learning task for robots.
- Dataset
- JSON
LLaMA-7B

A benchmark for evaluating the perception ability of Large Vision-Language Models (LVLMs) via various subtasks and scenarios.
- Dataset
- JSON
Dysca: A Dynamic and Scalable Benchmark for Evaluating Perception Ability of ...

Dysca is a dynamic and scalable benchmark for evaluating the perception ability of Large Vision-Language Models (LVLMs) via various subtasks and scenarios.
- Dataset
- JSON
Multimodal C4 (mmc4)

Multimodal C4 (mmc4) is a public, billion-scale corpus of images and text, constructed from public webpages contained in the cleaned English c4 corpus.
- Dataset
- JSON
TCGA-OMICS

TCGA-OMICS: A comprehensive dataset of genomic, transcriptomic, and proteomic data from The Cancer Genome Atlas Program
- Dataset
- JSON
MUGEN-GAME

MUGEN-GAME: A large-scale and multimodal dataset for video-audio-text multimodal understanding and generation
- Dataset
- JSON
Training transitive and commutative multimodal transformers with LoReTTa

Training transitive and commutative multimodal transformers with LoReTTa
- Dataset
- JSON
Towards Empathetic Open-Domain Conversation Models: A New Benchmark and Dataset

A dialogue dataset for open-domain conversation models.
- Dataset
- JSON
Personalizing Dialogue Agents: I Have a Dog, Do You Have Pets Too?

A dialogue dataset for personalizing dialogue agents.
- Dataset
- JSON
PhotoChat: A Human-Human Dialogue Dataset with Photo Sharing Behavior

A dialogue dataset with photo sharing behavior for joint image-text modeling.
- Dataset
- JSON
Constructing Multi-Modal Dialogue Dataset by Replacing Text with Semantically...

A multi-modal dialogue dataset created by replacing text with semantically relevant images.
- Dataset
- JSON
DialogCC: Large-Scale Multi-Modal Dialogue Dataset

A large-scale multi-modal dialogue dataset created by leveraging the automatic pipeline with filtering using CLIP similarity.
- Dataset
- JSON
MSRVTT-QA

Video question answering (VideoQA) requires systems to understand the visual information and infer an answer for a natural language question from it.
- Dataset
- JSON
VideoIC

VideoIC dataset for automatic live video commenting
- Dataset
- JSON
Livebot

Livebot dataset for automatic live video commenting
- Dataset
- JSON
Sentiment-oriented Transformer-based Variational Autoencoder Network for Live...

Sentiment-oriented Transformer-based Variational Autoencoder (So-TVAE) for Live Video Commenting
- Dataset
- JSON
InternVid: A Large-Scale Video-Text Dataset for Multimodal Understanding and ...

InternVid: A large-scale video-text dataset for multimodal understanding and generation.
- Dataset
- JSON

«
1
2
3
4
5
»

Before browse our site, please accept our cookies policy