-
TimeIT: A Video-Centric Instruction-Tuning Dataset
TimeIT is a video-centric instruction-tuning dataset designed for instruction tuning. It is composed of 6 diverse tasks, 12 widely-used academic benchmarks, and a total of 125K... -
QVHighlights
QVHighlights is a dataset for video highlight detection, which consists of over 10,000 videos annotated with human-written text queries. -
Video-LLaMA: An instruction-tuned audio-visual language model for video under...
A video-LLaMA model for video understanding, comprising 100k videos with detailed captions. -
VideoChat: Chat-centric video understanding
A video-based instruction dataset for video understanding, comprising 100k videos with detailed captions. -
Valley: A Video Assistant with Large Language Model Enhanced Ability
A large multi-modal instruction-following dataset for video understanding, comprising 37k conversation pairs, 26k complex reasoning QA pairs and 10k detail description...