11 datasets found

Filter Results
  • CSL-Daily

    CSL-Daily is a Chinese sign language (CSL) dataset that mainly focuses on people’s daily lives. It includes 18401, 1077, and 1176 available examples in the training, validation,...
  • PHOENIX-2014T

    PHOENIX-2014T is a German sign language (DGS) dataset that mainly includes weather forecast content from TV programs. It consists of 7096, 519, and 642 video text pairs in...
  • How2Sign

    How2Sign is a large-scale continuous American Sign Language (ASL) dataset. After removing invalid text-video pairs, we retain 31019, 1738, and 2348 available pairs in the...
  • Condensed Movies

    The dataset used for text-to-video retrieval and video classification tasks.
  • EclipSE: Efficient Long-range Video Retrieval using Sight and Sound

    EclipSE: Efficient Long-range Video Retrieval using Sight and Sound
  • Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval

    Frozen in time: A joint video and image encoder for end-to-end retrieval.
  • LSMDC

    The LSMDC movie description dataset consists of 118,081 short video clips extracted from 202 movies, each annotated with a single caption.
  • VATEX

    The dataset used in the paper is a video question answering dataset, which is a large-scale video-language pre-training task.
  • MSVD

    Text-Video Retrieval (TVR) aims to align relevant video content with natural language queries. To date, most state-of-the-art TVR methods learn image-to-video transfer learning...
  • ActivityNet Captions

    The ActivityNet Captions is a benchmark dataset proposed for dense video captioning. There are 20K untrimmed videos in total, and each video has several annotated segments with...
  • MSR-VTT

    The dataset used in the paper is MSR-VTT, a large video description dataset for bridging video and language. The dataset contains 10k video clips with length varying from 10 to...