6 datasets found

Tags: natural language video grounding

Filter Results
  • TACoS Speech

    The TACoS Speech dataset contains a large amount of open-world videos with more shot transitions.
  • Charades-STA Speech

    The Charades-STA Speech dataset contains a large amount of open-world videos with more shot transitions.
  • ActivityNet Speech

    The ActivityNet Speech dataset contains a large amount of open-world videos with more shot transitions.
  • TACoS

    A dataset of videos with multiple sentence descriptions, used for activity recognition and video description tasks.
  • Charades-STA

    Charades-STA dataset contains 12,408/3720 segment-sentence pairs and 5338/1334 videos in training and test set, respectively.
  • ActivityNet Captions

    The ActivityNet Captions is a benchmark dataset proposed for dense video captioning. There are 20K untrimmed videos in total, and each video has several annotated segments with...
You can also access this registry using the API (see API Docs).