Microsoft Video Description Corpus (MSVD)

The MSVD dataset is a public video captioning benchmark that contains 1,970 short video clips with 80,000 descriptions.

BibTex: