Video Description - Groups

Microsoft Research Video Description Corpus (MSVD)

The MSVD dataset is a collection of 1970 open domain clips from YouTube, annotated with variable-length captions.

Dataset
JSON

YouCook

A dataset of cooking videos with multiple sentence descriptions.

Dataset
JSON

Movie Description dataset

A novel dataset of movies with aligned descriptions sourced from movie scripts and DVS (Descriptive Video Service) audio descriptions.

Dataset
JSON

Discriminative Training: Learning to Describe Video with Sentences

The dataset used in the paper is a collection of video clips paired with sentential labels, where the goal is to learn word meanings from complex and realistic video clips.

Dataset
JSON

Grounded Video Description

Grounded video description is a dataset for video description.

Dataset
JSON

TACoS

A dataset of videos with multiple sentence descriptions, used for activity recognition and video description tasks.

Dataset
JSON

CGCaption

Large-scale video description dataset for bridging video and language

Dataset
JSON

HowTo100M

The dataset used in the LORD framework for autonomous driving, consisting of images, videos, and text-based observations.

Dataset
JSON

MSR-VTT: A Large Video Description Dataset for Bridging Video and Language

A Large Video Description Dataset for Bridging Video and Language.

Dataset
JSON

MSVD

Text-Video Retrieval (TVR) aims to align relevant video content with natural language queries. To date, most state-of-the-art TVR methods learn image-to-video transfer learning...

Dataset
JSON

ActivityNet Captions

The ActivityNet Captions is a benchmark dataset proposed for dense video captioning. There are 20K untrimmed videos in total, and each video has several annotated segments with...

Dataset
JSON

MSR-VTT

The dataset used in the paper is MSR-VTT, a large video description dataset for bridging video and language. The dataset contains 10k video clips with length varying from 10 to...

Dataset
JSON

12 datasets found