No Organization - Organizations

Multimodal C4 (mmc4)

Multimodal C4 (mmc4) is a public, billion-scale corpus of images and text, constructed from public webpages contained in the cleaned English c4 corpus.
- Dataset
- JSON
TCGA-OMICS

TCGA-OMICS: A comprehensive dataset of genomic, transcriptomic, and proteomic data from The Cancer Genome Atlas Program
- Dataset
- JSON
MUGEN-GAME

MUGEN-GAME: A large-scale and multimodal dataset for video-audio-text multimodal understanding and generation
- Dataset
- JSON
Training transitive and commutative multimodal transformers with LoReTTa

Training transitive and commutative multimodal transformers with LoReTTa
- Dataset
- JSON
Towards Empathetic Open-Domain Conversation Models: A New Benchmark and Dataset

A dialogue dataset for open-domain conversation models.
- Dataset
- JSON
Personalizing Dialogue Agents: I Have a Dog, Do You Have Pets Too?

A dialogue dataset for personalizing dialogue agents.
- Dataset
- JSON
PhotoChat: A Human-Human Dialogue Dataset with Photo Sharing Behavior

A dialogue dataset with photo sharing behavior for joint image-text modeling.
- Dataset
- JSON
Constructing Multi-Modal Dialogue Dataset by Replacing Text with Semantically...

A multi-modal dialogue dataset created by replacing text with semantically relevant images.
- Dataset
- JSON
DialogCC: Large-Scale Multi-Modal Dialogue Dataset

A large-scale multi-modal dialogue dataset created by leveraging the automatic pipeline with filtering using CLIP similarity.
- Dataset
- JSON
MSRVTT-QA

Video question answering (VideoQA) requires systems to understand the visual information and infer an answer for a natural language question from it.
- Dataset
- JSON
VideoIC

VideoIC dataset for automatic live video commenting
- Dataset
- JSON
Livebot

Livebot dataset for automatic live video commenting
- Dataset
- JSON
Sentiment-oriented Transformer-based Variational Autoencoder Network for Live...

Sentiment-oriented Transformer-based Variational Autoencoder (So-TVAE) for Live Video Commenting
- Dataset
- JSON
InternVid: A Large-Scale Video-Text Dataset for Multimodal Understanding and ...

InternVid: A large-scale video-text dataset for multimodal understanding and generation.
- Dataset
- JSON
WanJuan: A Comprehensive Multimodal Dataset for Advancing English and Chinese...

WanJuan: A comprehensive multimodal dataset for advancing English and Chinese large models.
- Dataset
- JSON
Crisscrossed Captions

Crisscrossed Captions (CxC) dataset is a multimodal learning dataset used for training and evaluation of the MURAL model.
- Dataset
- JSON
Wikipedia Image Text

Wikipedia Image Text (WIT) dataset is a large-scale multimodal learning dataset used for training and evaluation of the MURAL model.
- Dataset
- JSON
MURAL

Multimodal, Multitask Retrieval Across Languages (MURAL) dataset is used for training and evaluation of the MURAL model.
- Dataset
- JSON
EPIC-KITCHENS-100 Multi-Instance Retrieval Challenge

EPIC-KITCHENS-100 Multi-Instance Retrieval Challenge
- Dataset
- JSON
Multimodal Learning (MLM) dataset

The MLM dataset is a collection of images and captions that represent different cultures from around the world.
- Dataset
- JSON

89 datasets found