-
Multimodal C4 (mmc4)
Multimodal C4 (mmc4) is a public, billion-scale corpus of images and text, constructed from public webpages contained in the cleaned English c4 corpus. -
TCGA-OMICS
TCGA-OMICS: A comprehensive dataset of genomic, transcriptomic, and proteomic data from The Cancer Genome Atlas Program -
MUGEN-GAME
MUGEN-GAME: A large-scale and multimodal dataset for video-audio-text multimodal understanding and generation -
Training transitive and commutative multimodal transformers with LoReTTa
Training transitive and commutative multimodal transformers with LoReTTa -
Towards Empathetic Open-Domain Conversation Models: A New Benchmark and Dataset
A dialogue dataset for open-domain conversation models. -
Personalizing Dialogue Agents: I Have a Dog, Do You Have Pets Too?
A dialogue dataset for personalizing dialogue agents. -
PhotoChat: A Human-Human Dialogue Dataset with Photo Sharing Behavior
A dialogue dataset with photo sharing behavior for joint image-text modeling. -
Constructing Multi-Modal Dialogue Dataset by Replacing Text with Semantically...
A multi-modal dialogue dataset created by replacing text with semantically relevant images. -
DialogCC: Large-Scale Multi-Modal Dialogue Dataset
A large-scale multi-modal dialogue dataset created by leveraging the automatic pipeline with filtering using CLIP similarity. -
Sentiment-oriented Transformer-based Variational Autoencoder Network for Live...
Sentiment-oriented Transformer-based Variational Autoencoder (So-TVAE) for Live Video Commenting -
InternVid: A Large-Scale Video-Text Dataset for Multimodal Understanding and ...
InternVid: A large-scale video-text dataset for multimodal understanding and generation. -
WanJuan: A Comprehensive Multimodal Dataset for Advancing English and Chinese...
WanJuan: A comprehensive multimodal dataset for advancing English and Chinese large models. -
Crisscrossed Captions
Crisscrossed Captions (CxC) dataset is a multimodal learning dataset used for training and evaluation of the MURAL model. -
Wikipedia Image Text
Wikipedia Image Text (WIT) dataset is a large-scale multimodal learning dataset used for training and evaluation of the MURAL model. -
EPIC-KITCHENS-100 Multi-Instance Retrieval Challenge
EPIC-KITCHENS-100 Multi-Instance Retrieval Challenge -
Multimodal Learning (MLM) dataset
The MLM dataset is a collection of images and captions that represent different cultures from around the world.