5 datasets found

Tags: Description

Filter Results
  • MSR Video to Text (MSR-VTT)

    The MSR-VTT dataset is a large-scale video captioning benchmark that contains 10,000 video clips with 200,000 descriptions.
  • Microsoft Video Description Corpus (MSVD)

    The MSVD dataset is a public video captioning benchmark that contains 1,970 short video clips with 80,000 descriptions.
  • text2fabric

    A comprehensive, large-scale public dataset relating the visual appearance of fabrics to natural language.
  • MSVD

    Text-Video Retrieval (TVR) aims to align relevant video content with natural language queries. To date, most state-of-the-art TVR methods learn image-to-video transfer learning...
  • MSR-VTT

    The dataset used in the paper is MSR-VTT, a large video description dataset for bridging video and language. The dataset contains 10k video clips with length varying from 10 to...
You can also access this registry using the API (see API Docs).