Dataset - LDM

MAD: A Large-Scale Benchmark for Long-Form Video Temporal Grounding

MAD: A large-scale benchmark for long-form video temporal grounding, containing over 384K natural language queries that derived from high-quality audio description of mainstream...
- Dataset
- JSON
Quda: Natural Language Queries for Visual Data Analytics

A dataset of natural language queries for visual data analytics.
- Dataset
- JSON
Audio retrieval with natural language queries

The AudioCaps and Clotho datasets were used to build baselines for text-based audio retrieval.
- Dataset
- JSON
QVHighlights: Detecting moments and highlights in videos via natural language...

QVHighlights: Detecting moments and highlights in videos via natural language queries
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

4 datasets found