2 datasets found

Filter Results
  • Reuters Video-Language News Dataset

    The Reuters Video-Language News Dataset (ReutersViLNews) is a large-scale video-language understanding dataset containing 1,974 long-form news videos with an average video...
  • EgoSchema

    EgoSchema is a diagnostic benchmark for assessing very long-form video-language understanding capabilities of modern multimodal systems.