-
PHOENIX-2014T
PHOENIX-2014T is a German sign language (DGS) dataset that mainly includes weather forecast content from TV programs. It consists of 7096, 519, and 642 video text pairs in... -
Condensed Movies
The dataset used for text-to-video retrieval and video classification tasks. -
EclipSE: Efficient Long-range Video Retrieval using Sight and Sound
EclipSE: Efficient Long-range Video Retrieval using Sight and Sound -
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval
Frozen in time: A joint video and image encoder for end-to-end retrieval. -
ActivityNet Captions
The ActivityNet Captions is a benchmark dataset proposed for dense video captioning. There are 20K untrimmed videos in total, and each video has several annotated segments with...