ActivityNet Captions

doi:doi:10.57702/lxpldj5v

You're currently viewing an old version of this dataset. To see the current version, click here.

ActivityNet Captions

The ActivityNet Captions is a benchmark dataset proposed for dense video captioning. There are 20K untrimmed videos in total, and each video has several annotated segments with starting and ending times as well as the associated captions.

Data and Resources

Original MetadataJSON
The json representation of the dataset with its distributions based on DCAT.
Explore
- Preview
- Download

Cite this as

Ranjay Krishna, Kenji Hata, Frederic Ren, Li Fei-Fei, Juan Carlos Niebles (2024). Dataset: ActivityNet Captions. https://doi.org/10.57702/lxpldj5v

DOI retrieved: December 2, 2024

Additional Info

Field	Value
Created	December 2, 2024
Last update	December 2, 2024
Defined In	https://doi.org/10.48550/arXiv.2211.11427
Citation	https://doi.org/10.48550/arXiv.2112.01062 https://doi.org/10.48550/arXiv.2210.12977 https://doi.org/10.48550/arXiv.2210.15977 https://doi.org/10.1109/TPAMI.2023.3258628 https://doi.org/10.48550/arXiv.2205.08508 https://doi.org/10.48550/arXiv.2403.14174
Author	Ranjay Krishna
More Authors	Kenji Hata Frederic Ren Li Fei-Fei Juan Carlos Niebles
Homepage	https://activitynet.org/