-
ActivityNet, MSR-VTT, and MSVD
The dataset used in the paper is ActivityNet, MSR-VTT, and MSVD. The authors used these datasets for text-to-video retrieval tasks. -
FiGCLIP: Fine-Grained CLIP Adaptation via Densely Annotated Videos
Fine-grained adaptation of the popular CLIP model across multiple datasets.