You're currently viewing an old version of this dataset. To see the current version, click here.

Valley: A Video Assistant with Large Language Model Enhanced Ability

A large multi-modal instruction-following dataset for video understanding, comprising 37k conversation pairs, 26k complex reasoning QA pairs and 10k detail description instruction pairs.

Data and Resources

This dataset has no data

Cite this as

Ruipu Luo, Ziwang Zhao, Min Yang, Junwei Dong, Da Li, Pengcheng Lu, Tao Wang, Linmei Hu, Minghui Qiu, Zhongyu Wei (2024). Dataset: Valley: A Video Assistant with Large Language Model Enhanced Ability. https://doi.org/10.57702/5o0eqfc5

Private DOI This DOI is not yet resolvable.
It is available for use in manuscripts, and will be published when the Dataset is made public.

Additional Info

Field Value
Created December 3, 2024
Last update December 3, 2024
Defined In https://doi.org/10.48550/arXiv.2306.07207
Author Ruipu Luo
More Authors
Ziwang Zhao
Min Yang
Junwei Dong
Da Li
Pengcheng Lu
Tao Wang
Linmei Hu
Minghui Qiu
Zhongyu Wei
Homepage https://arxiv.org/abs/2305.06500