You're currently viewing an old version of this dataset. To see the current version, click here.

TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding

TimeChat is a time-sensitive multimodal large language model specifically designed for long video understanding. It incorporates two key architectural contributions: a timestamp-aware frame encoder that binds visual content with the timestamp of each frame, and a sliding video Q-Former that produces a video token sequence of varying lengths to accommodate videos of various durations.

Data and Resources

This dataset has no data

Cite this as

Shuhuai Ren, Linli Yao, Shicheng Li, Xu Sun, Lu Hou (2025). Dataset: TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding. https://doi.org/10.57702/z8j50qno

Private DOI This DOI is not yet resolvable.
It is available for use in manuscripts, and will be published when the Dataset is made public.

Additional Info

Field Value
Created January 3, 2025
Last update January 3, 2025
Defined In https://doi.org/10.48550/arXiv.2312.02051
Author Shuhuai Ren
More Authors
Linli Yao
Shicheng Li
Xu Sun
Lu Hou