Multimodal Learning - Groups

WIT: Wikipedia-based image text dataset for multimodal multilingual machine l...

A multimodal dataset for machine learning tasks, focusing on Wikipedia-based image text datasets.

Dataset
JSON

ShapeNeRF–Text

The ShapeNeRF–Text dataset consists of 40K paired NeRFs and language annotations for ShapeNet objects.

Dataset
JSON

MSVD-QA

The MSVD-QA dataset is a benchmark for video question answering, containing 1,970 videos with multiple-choice questions.

Dataset
JSON

TGIF-QA

The TGIF-QA dataset consists of 165165 QA pairs chosen from 71741 animated GIFs. To evaluate the spatiotemporal reasoning ability at the video level, TGIF-QA dataset designs...

Dataset
JSON

Video-LLaMA: An instruction-tuned audio-visual language model for video under...

A video-LLaMA model for video understanding, comprising 100k videos with detailed captions.

Dataset
JSON

VideoChat: Chat-centric video understanding

A video-based instruction dataset for video understanding, comprising 100k videos with detailed captions.

Dataset
JSON

Valley: A Video Assistant with Large Language Model Enhanced Ability

A large multi-modal instruction-following dataset for video understanding, comprising 37k conversation pairs, 26k complex reasoning QA pairs and 10k detail description...

Dataset
JSON

The Hateful Memes dataset

The Hateful Memes dataset aims to help develop models that more eﬀectively detect multimodal hateful content.

Dataset
JSON

SNLI-VE

The dataset used in the paper is a set of sequential vision-and-language tasks, where each task consists of an image and a text input.

Dataset
JSON

Datacomp

The dataset used in the paper for training and evaluation of the HYPE method.

Dataset
JSON

IMAGINE: An Imagination-Based Automatic Evaluation Metric for Natural Languag...

Automatic evaluations for natural language generation (NLG) conventionally rely on token-level or embedding-level comparisons with the text references. This is different from...

Dataset
JSON

Hateful Memes Challenge

The Hateful Memes dataset is a multimodal dataset containing 10,000+ new examples of multimodal content.

Dataset
JSON

Uniter dataset

The Uniter dataset is a multimodal learning dataset, which consists of images and corresponding text.

Dataset
JSON

End-to-End Referring Video Object Segmentation with Multimodal Transformers

The referring video object segmentation task (RVOS) involves segmentation of a text-referred object instance in the frames of a given video.

Dataset
JSON

Multimodal Variational Autoencoder for Cardiac Hemodynamics Instability Detec...

A multimodal variational autoencoder for low-cost cardiac hemodynamics instability detection from CXR and ECG.

Dataset
JSON

CSL

The CSL dataset is a large-scale Chinese scientific literature dataset obtained from the "Qianyan" open-source NLP platform. It consists of 396,209 Chinese core journal papers'...

Dataset
JSON

MineCLIP

The MineCLIP dataset is a large-scale dataset of Minecraft demonstrations.

Dataset
JSON

GenRL

The dataset used in the paper is not explicitly described, but it is mentioned that the authors used a combination of reinforcement learning and generative models to solve...

Dataset
JSON

PMC-CLIP

PMC-CLIP: Contrastive language-image pre-training using biomedical documents.

Dataset
JSON

BioMedClip

BioMedClip: A CLIP model pretrained on image-text pairs extracted from PubMed Central repository.

Dataset
JSON

95 datasets found