You're currently viewing an old version of this dataset. To see the current version, click here.

InternVid: A Large-Scale Video-Text Dataset for Multimodal Understanding and Generation

InternVid: A large-scale video-text dataset for multimodal understanding and generation.

Data and Resources

Cite this as

Conghui He, Wei Li, Zhenjiang Jin, Chao Xu, Bin Wang, Dahua Lin (2024). Dataset: InternVid: A Large-Scale Video-Text Dataset for Multimodal Understanding and Generation. https://doi.org/10.57702/4jomly5t

DOI retrieved: December 16, 2024

Additional Info

Field Value
Created December 16, 2024
Last update December 16, 2024
Defined In https://doi.org/10.48550/arXiv.2407.13773
Author Conghui He
More Authors
Wei Li
Zhenjiang Jin
Chao Xu
Bin Wang
Dahua Lin
Homepage https://arxiv.org/abs/2307.06942