Dataset - LDM

YouCook2

YouCook2 consists of recipes containing labels that separate the long horizon trajectories of demonstrations into events - with explicit time stamps for the beginning and end of...
- Dataset
- JSON
TimeIT: A Video-Centric Instruction-Tuning Dataset

TimeIT is a video-centric instruction-tuning dataset designed for instruction tuning. It is composed of 6 diverse tasks, 12 widely-used academic benchmarks, and a total of 125K...
- Dataset
- JSON
TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Und...

TimeChat is a time-sensitive multimodal large language model specifically designed for long video understanding. It incorporates two key architectural contributions: a...
- Dataset
- JSON
Voice Aging with Audio-Visual Style Transfer

Face aging techniques have used generative adversarial networks (GANs) and style transfer learning to transform one’s appearance to look younger/older. Identity is maintained by...
- Dataset
- JSON
SemEval-2021 Task 6: Detection of Persuasion Techniques in Texts and Images

The dataset used in the paper for the SemEval-2021 task 6: Detection of persuasion techniques in texts and images using CLIP features.
- Dataset
- JSON
Reuters Video-Language News Dataset

The Reuters Video-Language News Dataset (ReutersViLNews) is a large-scale video-language understanding dataset containing 1,974 long-form news videos with an average video...
- Dataset
- JSON
Hateful Memes Dataset

The Hateful Memes Dataset consists of a training set of 8500 images, a dev set of 500 images & a test set of 1000 images. The meme text is present on the images, but also...
- Dataset
- JSON
OK-VQA

The OK-VQA dataset is a visual question answering benchmark requiring external knowledge.
- Dataset
- JSON
VLMixer: Unpaired Vision-Language Pre-training via Cross-Modal CutMix

Unpaired vision-language pre-training via cross-modal CutMix.
- Dataset
- JSON
PowMix: A Versatile Regularizer for Multimodal Sentiment Analysis

Multimodal sentiment analysis (MSA) leverages heterogeneous data sources to interpret the complex nature of human sentiments.
- Dataset
- JSON
InternLM2

InternLM2 is a vision-language large model that supports images with any aspect ratio from 336 pixels up to 4K HD, facilitating its deployment in real-world contexts.
- Dataset
- JSON
FocusCLIP: Multimodal Subject-Level Guidance for Zero-Shot Transfer in Human-...

FocusCLIP: Multimodal Subject-Level Guidance for Zero-Shot Transfer in Human-Centric Tasks. This paper introduces FocusCLIP, an enhancement for CLIP pretraining using a new ROI...
- Dataset
- JSON
QVHighlights

QVHighlights is a dataset for video highlight detection, which consists of over 10,000 videos annotated with human-written text queries.
- Dataset
- JSON
WavCaps

The WavCaps dataset contains chatGPT-assisted weakly-labeled audio captioning data.
- Dataset
- JSON
Multimodal Visual Patterns (MMVP) Benchmark

The Multimodal Visual Patterns (MMVP) benchmark is a dataset used to evaluate the visual question answering capabilities of multimodal large language models (MLLMs).
- Dataset
- JSON
Multimodal C4 (mmc4)

Multimodal C4 (mmc4) is a public, billion-scale corpus of images and text, constructed from public webpages contained in the cleaned English c4 corpus.
- Dataset
- JSON
MSRVTT-QA

Video question answering (VideoQA) requires systems to understand the visual information and infer an answer for a natural language question from it.
- Dataset
- JSON
Multimodal Learning (MLM) dataset

The MLM dataset is a collection of images and captions that represent different cultures from around the world.
- Dataset
- JSON
Stanford Large Movie, Games and Datasets Archive (SMLMDA)

Stanford Large Movie, Games and Datasets Archive (SMLMDA) dataset is used for training and evaluation.
- Dataset
- JSON
Multimodal Contrastive Learning

The dataset used in the paper is a collection of pairs of observations (xi, ˜xi) from two modalities, where xi ∈ Rd1 and ˜xi ∈ Rd2. The dataset is used to evaluate the...
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

53 datasets found