Dataset - LDM

LLaVA 158k

The LLaVA 158k dataset is a large-scale multimodal learning dataset, which is used for training and testing multimodal large language models.
- Dataset
- JSON
Multimodal Robustness Benchmark

The MMR benchmark is designed to evaluate MLLMs' comprehension of visual content and robustness against misleading questions, ensuring models truly leverage multimodal inputs...
- Dataset
- JSON
Multi-ZOL

Multi-ZOL is a Chinese dataset for Target-oriented Multimodal Sentiment Classification (TMSC). The dataset contains text and image data, where the text data is used to determine...
- Dataset
- JSON
Twitter15 and Twitter17

Twitter15 and Twitter17 are two English datasets for Target-oriented Multimodal Sentiment Classification (TMSC). The datasets contain text and image data, where the text data is...
- Dataset
- JSON
Openclip

Openclip: A large-scale multimodal dataset for vision and language understanding.
- Dataset
- JSON
Degree Datasets

Degree datasets are constructed by gradually adjusting the degree of alignment between image and text.
- Dataset
- JSON
Caption MNIST

Caption MNIST is a synthetic image-text pair dataset built by filling in the missing colors, digits, and positions in the MNIST dataset.
- Dataset
- JSON
XD-Violence

The XD-Violence dataset is a large-scale multimodal video dataset for violence detection. It consists of 4,754 untrimmed videos with a total duration of 217 hours, covering six...
- Dataset
- JSON
TCGA-OMICS

TCGA-OMICS: A comprehensive dataset of genomic, transcriptomic, and proteomic data from The Cancer Genome Atlas Program
- Dataset
- JSON
MUGEN-GAME

MUGEN-GAME: A large-scale and multimodal dataset for video-audio-text multimodal understanding and generation
- Dataset
- JSON
MSRVTT-QA

Video question answering (VideoQA) requires systems to understand the visual information and infer an answer for a natural language question from it.
- Dataset
- JSON
InternVid: A Large-Scale Video-Text Dataset for Multimodal Understanding and ...

InternVid: A large-scale video-text dataset for multimodal understanding and generation.
- Dataset
- JSON
WanJuan: A Comprehensive Multimodal Dataset for Advancing English and Chinese...

WanJuan: A comprehensive multimodal dataset for advancing English and Chinese large models.
- Dataset
- JSON
Crisscrossed Captions

Crisscrossed Captions (CxC) dataset is a multimodal learning dataset used for training and evaluation of the MURAL model.
- Dataset
- JSON
Wikipedia Image Text

Wikipedia Image Text (WIT) dataset is a large-scale multimodal learning dataset used for training and evaluation of the MURAL model.
- Dataset
- JSON
MURAL

Multimodal, Multitask Retrieval Across Languages (MURAL) dataset is used for training and evaluation of the MURAL model.
- Dataset
- JSON
EPIC-KITCHENS-100 Multi-Instance Retrieval Challenge

EPIC-KITCHENS-100 Multi-Instance Retrieval Challenge
- Dataset
- JSON
DeepSense 6G: Large-Scale Real-World Multimodal Sensing and Communication Dat...

Development dataset for multimodal beam prediction challenge
- Dataset
- JSON
Multimodal Transformers for Wireless Communications: A Case Study in Beam Pre...

Multimodal transformer deep learning framework for sensing-assisted beam prediction in wireless communications
- Dataset
- JSON
Youtube2Text-QA

Video question answering task, which requires machines to answer questions about videos in a natural language form.
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

45 datasets found