You're currently viewing an old version of this dataset. To see the current version, click here.

ActivityNet Captions

The ActivityNet Captions is a benchmark dataset proposed for dense video captioning. There are 20K untrimmed videos in total, and each video has several annotated segments with starting and ending times as well as the associated captions.

Data and Resources

Cite this as

Ranjay Krishna, Kenji Hata, Frederic Ren, Li Fei-Fei, Juan Carlos Niebles (2024). Dataset: ActivityNet Captions. https://doi.org/10.57702/lxpldj5v

DOI retrieved: December 2, 2024

Additional Info

Field Value
Created December 2, 2024
Last update December 2, 2024
Defined In https://doi.org/10.48550/arXiv.2211.11427
Citation
  • https://doi.org/10.48550/arXiv.2112.01062
  • https://doi.org/10.48550/arXiv.2210.12977
  • https://doi.org/10.48550/arXiv.2210.15977
  • https://doi.org/10.1109/TPAMI.2023.3258628
  • https://doi.org/10.48550/arXiv.2205.08508
  • https://doi.org/10.48550/arXiv.2403.14174
Author Ranjay Krishna
More Authors
Kenji Hata
Frederic Ren
Li Fei-Fei
Juan Carlos Niebles
Homepage https://activitynet.org/