3 datasets found

Tags: video-language

Filter Results
  • MSRVTT-QA

    Video question answering (VideoQA) requires systems to understand the visual information and infer an answer for a natural language question from it.
  • MSVD-QA

    The MSVD-QA dataset is a benchmark for video question answering, containing 1,970 videos with multiple-choice questions.
  • MSR-VTT

    The dataset used in the paper is MSR-VTT, a large video description dataset for bridging video and language. The dataset contains 10k video clips with length varying from 10 to...
You can also access this registry using the API (see API Docs).