Cite this as

Conghui He, Wei Li, Zhenjiang Jin, Chao Xu, Bin Wang, Dahua Lin (2024). Dataset: InternVid: A Large-Scale Video-Text Dataset for Multimodal Understanding and Generation. Resource: Original Metadata. https://doi.org/10.57702/4jomly5t

DOI retrieved: December 16, 2024

Additional Information

Field Value
Created December 16, 2024
Last updated December 16, 2024
Format JSON