Dataset - LDM

YouCook2

YouCook2 consists of recipes containing labels that separate the long horizon trajectories of demonstrations into events - with explicit time stamps for the beginning and end of...
- Dataset
- JSON
Learning to compose topic-aware mixture of experts for zero-shot video captio...

The dataset is used for zero-shot video captioning.
- Dataset
- JSON
MSR Video to Text (MSR-VTT)

The MSR-VTT dataset is a large-scale video captioning benchmark that contains 10,000 video clips with 200,000 descriptions.
- Dataset
- JSON
Microsoft Video Description Corpus (MSVD)

The MSVD dataset is a public video captioning benchmark that contains 1,970 short video clips with 80,000 descriptions.
- Dataset
- JSON
VATEX

The dataset used in the paper is a video question answering dataset, which is a large-scale video-language pre-training task.
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

5 datasets found