5 datasets found

Tags: temporal activity localization

Filter Results
  • Temporal Sentence Grounding in Videos

    Temporal sentence grounding in videos (TSGV) is a task to retrieve a video segment that semantically corresponds to a query in natural language.
  • TACoS

    A dataset of videos with multiple sentence descriptions, used for activity recognition and video description tasks.
  • Charades-STA

    Charades-STA dataset contains 12,408/3720 segment-sentence pairs and 5338/1334 videos in training and test set, respectively.
  • DiDeMo

    The DiDeMo dataset is a large-scale video-text dataset, containing 10,000 videos and 40,000 annotations.
  • ActivityNet Captions

    The ActivityNet Captions is a benchmark dataset proposed for dense video captioning. There are 20K untrimmed videos in total, and each video has several annotated segments with...
You can also access this registry using the API (see API Docs).