-
Streamlined dense video captioning
Streamlined dense video captioning. -
ActivityNet 2019 Task 3: Exploring Contexts for Dense Captioning Events in Vi...
Contextual reasoning is essential to understand events in long untrimmed videos. In this work, we systematically explore different captioning models with various contexts for... -
MSR Video to Text (MSR-VTT)
The MSR-VTT dataset is a large-scale video captioning benchmark that contains 10,000 video clips with 200,000 descriptions. -
Microsoft Video Description Corpus (MSVD)
The MSVD dataset is a public video captioning benchmark that contains 1,970 short video clips with 80,000 descriptions. -
Dense-captioning events in videos
Dense-captioning events in videos. -
Video Captioning Dataset
A video captioning dataset generated by pseudolabeling videos with image captioning models.