Video Retrieval - Groups

AVSD dataset

The AVSD dataset is a benchmark for audio-visual scene-aware dialog. It consists of 7659 training, 734 prototype validation, and 733 prototype testing dialog, where the...

Dataset
JSON

CSL-Daily

CSL-Daily is a Chinese sign language (CSL) dataset that mainly focuses on people’s daily lives. It includes 18401, 1077, and 1176 available examples in the training, validation,...

Dataset
JSON

PHOENIX-2014T

PHOENIX-2014T is a German sign language (DGS) dataset that mainly includes weather forecast content from TV programs. It consists of 7096, 519, and 642 video text pairs in...

Dataset
JSON

How2Sign

How2Sign is a large-scale continuous American Sign Language (ASL) dataset. After removing invalid text-video pairs, we retain 31019, 1738, and 2348 available pairs in the...

Dataset
JSON

Condensed Movies

The dataset used for text-to-video retrieval and video classification tasks.

Dataset
JSON

EclipSE: Efficient Long-range Video Retrieval using Sight and Sound

Dataset
JSON

Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval

Frozen in time: A joint video and image encoder for end-to-end retrieval.

Dataset
JSON

LSMDC

The LSMDC movie description dataset consists of 118,081 short video clips extracted from 202 movies, each annotated with a single caption.

Dataset
JSON

VATEX

The dataset used in the paper is a video question answering dataset, which is a large-scale video-language pre-training task.

Dataset
JSON

MSVD

Text-Video Retrieval (TVR) aims to align relevant video content with natural language queries. To date, most state-of-the-art TVR methods learn image-to-video transfer learning...

Dataset
JSON

ActivityNet Captions

The ActivityNet Captions is a benchmark dataset proposed for dense video captioning. There are 20K untrimmed videos in total, and each video has several annotated segments with...

Dataset
JSON

MSR-VTT

The dataset used in the paper is MSR-VTT, a large video description dataset for bridging video and language. The dataset contains 10k video clips with length varying from 10 to...

Dataset
JSON

12 datasets found