Valley: A Video Assistant with Large Language Model Enhanced Ability

doi:doi:10.57702/5o0eqfc5

Valley: A Video Assistant with Large Language Model Enhanced Ability

A large multi-modal instruction-following dataset for video understanding, comprising 37k conversation pairs, 26k complex reasoning QA pairs and 10k detail description instruction pairs.

Data and Resources

Original MetadataJSON
The json representation of the dataset with its distributions based on DCAT.
Explore
- Preview
- Download

Cite this as

Ruipu Luo, Ziwang Zhao, Min Yang, Junwei Dong, Da Li, Pengcheng Lu, Tao Wang, Linmei Hu, Minghui Qiu, Zhongyu Wei (2024). Dataset: Valley: A Video Assistant with Large Language Model Enhanced Ability. https://doi.org/10.57702/5o0eqfc5

DOI retrieved: December 3, 2024

Additional Info

Field	Value
Created	December 3, 2024
Last update	December 3, 2024
Defined In	https://doi.org/10.48550/arXiv.2306.07207
Author	Ruipu Luo
More Authors	Ziwang Zhao Min Yang Junwei Dong Da Li Pengcheng Lu Tao Wang Linmei Hu Minghui Qiu Zhongyu Wei
Homepage	https://arxiv.org/abs/2305.06500