Text-to-Video Retrieval - Groups

ActivityNet, MSR-VTT, and MSVD

The dataset used in the paper is ActivityNet, MSR-VTT, and MSVD. The authors used these datasets for text-to-video retrieval tasks.
- Dataset
- JSON
FiGCLIP: Fine-Grained CLIP Adaptation via Densely Annotated Videos

Fine-grained adaptation of the popular CLIP model across multiple datasets.
- Dataset
- JSON

Before browse our site, please accept our cookies policy

2 datasets found