You're currently viewing an old version of this dataset. To see the current version, click here.

Valley: A Video Assistant with Large Language Model Enhanced Ability

A large multi-modal instruction-following dataset for video understanding, comprising 37k conversation pairs, 26k complex reasoning QA pairs and 10k detail description instruction pairs.

Data and Resources

This dataset has no data

Cite this as

Ruipu Luo, Ziwang Zhao, Min Yang, Junwei Dong, Da Li, Pengcheng Lu, Tao Wang, Linmei Hu, Minghui Qiu, Zhongyu Wei (2024). Dataset: Valley: A Video Assistant with Large Language Model Enhanced Ability. https://doi.org/10.57702/5o0eqfc5

Private DOI This DOI is not yet resolvable.
It is available for use in manuscripts, and will be published when the Dataset is made public.

Additional Info

Field	Value
Created	December 3, 2024
Last update	December 3, 2024
Defined In	https://doi.org/10.48550/arXiv.2306.07207
Author	Ruipu Luo
More Authors	Ziwang Zhao Min Yang Junwei Dong Da Li Pengcheng Lu Tao Wang Linmei Hu Minghui Qiu Zhongyu Wei
Homepage	https://arxiv.org/abs/2305.06500