-
Slot-VLM: SlowFast Slots for Video-Language Modeling
Video-Language Models (VLMs), powered by the advancements in Large Language Models (LLMs), are charting new frontiers in video understanding. A pivotal challenge is the... -
Verb-Focused Contrastive Pretraining
The dataset used in the paper for verb-focused contrastive pretraining.