InternVid: A Large-Scale Video-Text Dataset for Multimodal Understanding and Generation

doi:doi:10.57702/4jomly5t

InternVid: A Large-Scale Video-Text Dataset for Multimodal Understanding and Generation

InternVid: A large-scale video-text dataset for multimodal understanding and generation.

Data and Resources

Original MetadataJSON
The json representation of the dataset with its distributions based on DCAT.
Explore
- Preview
- Download

Cite this as

Conghui He, Wei Li, Zhenjiang Jin, Chao Xu, Bin Wang, Dahua Lin (2024). Dataset: InternVid: A Large-Scale Video-Text Dataset for Multimodal Understanding and Generation. https://doi.org/10.57702/4jomly5t

DOI retrieved: December 16, 2024

Additional Info

Field	Value
Created	December 16, 2024
Last update	December 16, 2024
Defined In	https://doi.org/10.48550/arXiv.2407.13773
Author	Conghui He
More Authors	Wei Li Zhenjiang Jin Chao Xu Bin Wang Dahua Lin
Homepage	https://arxiv.org/abs/2307.06942