Video Captioning - Groups

YouCook2

YouCook2 consists of recipes containing labels that separate the long horizon trajectories of demonstrations into events - with explicit time stamps for the beginning and end of...

Dataset
JSON

SoccerNet-Caption

A dataset for dense video captioning for soccer broadcasts commentaries.

Dataset
JSON

Streamlined dense video captioning

Streamlined dense video captioning.

Dataset
JSON

ActivityNet 2019 Task 3: Exploring Contexts for Dense Captioning Events in Vi...

Contextual reasoning is essential to understand events in long untrimmed videos. In this work, we systematically explore different captioning models with various contexts for...

Dataset
JSON

MSRVTT

The MSRVTT is a large-scale dataset for video captioning. It contains 10k video clips and each video clip is accompanied with 20 human-edited English sentence descriptions,...

Dataset
JSON

MSR Video to Text (MSR-VTT)

The MSR-VTT dataset is a large-scale video captioning benchmark that contains 10,000 video clips with 200,000 descriptions.

Dataset
JSON

Microsoft Video Description Corpus (MSVD)

The MSVD dataset is a public video captioning benchmark that contains 1,970 short video clips with 80,000 descriptions.

Dataset
JSON

Dense-captioning events in videos

Dense-captioning events in videos.

Dataset
JSON

VATEX

The dataset used in the paper is a video question answering dataset, which is a large-scale video-language pre-training task.

Dataset
JSON

MSVD

Text-Video Retrieval (TVR) aims to align relevant video content with natural language queries. To date, most state-of-the-art TVR methods learn image-to-video transfer learning...

Dataset
JSON

MSR-VTT

The dataset used in the paper is MSR-VTT, a large video description dataset for bridging video and language. The dataset contains 10k video clips with length varying from 10 to...

Dataset
JSON

UCF101

The UCF101 dataset contains 13320 videos distributed in 101 action categories. This dataset is different from the above ones in that it contains mostly coarse sports activities...

Dataset
JSON

Video Captioning Dataset

A video captioning dataset generated by pseudolabeling videos with image captioning models.

Dataset
JSON

13 datasets found