MSR-VTT

doi:doi:10.57702/hi8ky096

MSR-VTT

The dataset used in the paper is MSR-VTT, a large video description dataset for bridging video and language. The dataset contains 10k video clips with length varying from 10 to 32 seconds, and each video is provided with 20 related captions for training.

Data and Resources

Original MetadataJSON
The json representation of the dataset with its distributions based on DCAT.
Explore
- Preview
- Download

Cite this as

Jun Xu, Tao Mei, Ting Yao, Yong Rui (2024). Dataset: MSR-VTT. https://doi.org/10.57702/hi8ky096

DOI retrieved: December 2, 2024

Additional Info

Field	Value
Created	December 2, 2024
Last update	December 2, 2024
Defined In	https://doi.org/10.48550/arXiv.2109.14084
Citation	https://doi.org/10.48550/arXiv.2305.10474 https://doi.org/10.48550/arXiv.2405.16009 https://doi.org/10.48550/arXiv.2111.12476 https://doi.org/10.48550/arXiv.2404.13425 https://doi.org/10.48550/arXiv.2310.12190 https://doi.org/10.48550/arXiv.2106.05438 https://doi.org/10.48550/arXiv.2406.08656 https://doi.org/10.48550/arXiv.2312.07509 https://doi.org/10.48550/arXiv.2103.15686 https://doi.org/10.48550/arXiv.2211.11427 https://doi.org/10.48550/arXiv.2307.09972 https://doi.org/10.48550/arXiv.2402.03161 https://doi.org/10.48550/arXiv.2209.13853 https://doi.org/10.1609/aaai.v37i3.25483 https://doi.org/10.48550/arXiv.1804.05448 https://doi.org/10.48550/arXiv.2007.02503 https://doi.org/10.48550/arXiv.2311.13073 https://doi.org/10.48550/arXiv.2312.08870 https://doi.org/10.48550/arXiv.1704.01502 https://doi.org/10.48550/arXiv.2402.04324 https://doi.org/10.48550/arXiv.2205.08508
Author	Jun Xu
More Authors	Tao Mei Ting Yao Yong Rui
Homepage	https://msrvtt.github.io/