Video Language Understanding - Groups

Reuters Video-Language News Dataset

The Reuters Video-Language News Dataset (ReutersViLNews) is a large-scale video-language understanding dataset containing 1,974 long-form news videos with an average video...
- Dataset
- JSON
EgoSchema

EgoSchema is a diagnostic benchmark for assessing very long-form video-language understanding capabilities of modern multimodal systems.
- Dataset
- JSON

Before browse our site, please accept our cookies policy

2 datasets found