Multimodal Learning - Groups

WanJuan: A Comprehensive Multimodal Dataset for Advancing English and Chinese...

WanJuan: A comprehensive multimodal dataset for advancing English and Chinese large models.

Dataset
JSON

Crisscrossed Captions

Crisscrossed Captions (CxC) dataset is a multimodal learning dataset used for training and evaluation of the MURAL model.

Dataset
JSON

Wikipedia Image Text

Wikipedia Image Text (WIT) dataset is a large-scale multimodal learning dataset used for training and evaluation of the MURAL model.

Dataset
JSON

MURAL

Multimodal, Multitask Retrieval Across Languages (MURAL) dataset is used for training and evaluation of the MURAL model.

Dataset
JSON

EPIC-KITCHENS-100 Multi-Instance Retrieval Challenge

Dataset
JSON

Multimodal Learning (MLM) dataset

The MLM dataset is a collection of images and captions that represent different cultures from around the world.

Dataset
JSON

Stanford Large Movie, Games and Datasets Archive (SMLMDA)

Stanford Large Movie, Games and Datasets Archive (SMLMDA) dataset is used for training and evaluation.

Dataset
JSON

DeepSense 6G: Large-Scale Real-World Multimodal Sensing and Communication Dat...

Development dataset for multimodal beam prediction challenge

Dataset
JSON

Multimodal Transformers for Wireless Communications: A Case Study in Beam Pre...

Multimodal transformer deep learning framework for sensing-assisted beam prediction in wireless communications

Dataset
JSON

Multimodal Contrastive Learning

The dataset used in the paper is a collection of pairs of observations (xi, ˜xi) from two modalities, where xi ∈ Rd1 and ˜xi ∈ Rd2. The dataset is used to evaluate the...

Dataset
JSON

Youtube2Text-QA

Video question answering task, which requires machines to answer questions about videos in a natural language form.

Dataset
JSON

RWTH-PHOENIX-Weather

Continuous sign language recognition (SLR) deals with unaligned video-text pair and uses the word error rate (WER), i.e., edit distance, as the main evaluation metric.

Dataset
JSON

AccidentBlip2

A multimodal large language model for accident detection with multi-view motion reasoning

Dataset
JSON

RANKCLIP: Ranking-Consistent Language-Image Pretraining

Self-supervised contrastive learning models, such as CLIP, have set new benchmarks for vision-language models in many downstream tasks. However, their dependency on rigid...

Dataset
JSON

Kosmos-2: Grounding multimodal large language models to the world

Kosmos-2: Grounding multimodal large language models to the world.

Dataset
JSON

Visual instruction tuning

Visual instruction tuning.

Dataset
JSON

Flamingo: a visual language model for few-shot learning

Flamingo: a visual language model for few-shot learning.

Dataset
JSON

Audio-visual scene-aware dialog

Audio-visual scene-aware dialog.

Dataset
JSON

ChatBridge

ChatBridge is a multimodal language model capable of perceiving real-world multimodal information, as well as following instructions, thinking, and interacting with humans in...

Dataset
JSON

Flickr30k entities: Collecting region-to-phrase correspondences for richer im...

A dataset for multimodal learning tasks, focusing on region-to-phrase correspondences for image-to-sentence models.

Dataset
JSON

95 datasets found