You're currently viewing an old version of this dataset. To see the current version, click here.

MSR-VTT

The dataset used in the paper is MSR-VTT, a large video description dataset for bridging video and language. The dataset contains 10k video clips with length varying from 10 to 32 seconds, and each video is provided with 20 related captions for training.

Data and Resources

This dataset has no data

Cite this as

Jun Xu, Tao Mei, Ting Yao, Yong Rui (2024). Dataset: MSR-VTT. https://doi.org/10.57702/hi8ky096

Private DOI This DOI is not yet resolvable.
It is available for use in manuscripts, and will be published when the Dataset is made public.

Additional Info

Field	Value
Created	December 2, 2024
Last update	December 2, 2024
Defined In	https://doi.org/10.48550/arXiv.2109.14084
Citation	https://doi.org/10.48550/arXiv.2305.10474 https://doi.org/10.48550/arXiv.2405.16009 https://doi.org/10.48550/arXiv.2111.12476 https://doi.org/10.48550/arXiv.2404.13425 https://doi.org/10.48550/arXiv.2310.12190 https://doi.org/10.48550/arXiv.2106.05438 https://doi.org/10.48550/arXiv.2406.08656 https://doi.org/10.48550/arXiv.2312.07509 https://doi.org/10.48550/arXiv.2103.15686 https://doi.org/10.48550/arXiv.2211.11427 https://doi.org/10.48550/arXiv.2307.09972 https://doi.org/10.48550/arXiv.2402.03161 https://doi.org/10.48550/arXiv.2209.13853 https://doi.org/10.1609/aaai.v37i3.25483 https://doi.org/10.48550/arXiv.1804.05448 https://doi.org/10.48550/arXiv.2007.02503 https://doi.org/10.48550/arXiv.2311.13073 https://doi.org/10.48550/arXiv.2312.08870 https://doi.org/10.48550/arXiv.1704.01502 https://doi.org/10.48550/arXiv.2402.04324 https://doi.org/10.48550/arXiv.2205.08508
Author	Jun Xu
More Authors	Tao Mei Ting Yao Yong Rui
Homepage	https://msrvtt.github.io/