ActivityNet Captions

The ActivityNet Captions is a benchmark dataset proposed for dense video captioning. There are 20K untrimmed videos in total, and each video has several annotated segments with starting and ending times as well as the associated captions.

BibTex: