Skip to content
Log in
Register
Toggle navigation
Datasets
All
Imported
Services
Organizations
Groups
About
Demo
FedORKG
Search Datasets
Home
Datasets
Order by
Relevance
Name Ascending
Name Descending
Last Modified
Go
3 datasets found
Tags:
cross-modal learning
Filter Results
Places
The dataset used in the paper is Places, a large dataset of 400k pairs of images from the Places 205 dataset and corresponding spoken audio captions.
Dataset
JSON
EPIC: Leveraging Per Image-Token Consistency for Vision-Language Pre-training
The proposed EPIC method is a pre-training approach that leverages more text tokens for learning vision-language associations.
Dataset
JSON
MSR-VTT
The dataset used in the paper is MSR-VTT, a large video description dataset for bridging video and language. The dataset contains 10k video clips with length varying from 10 to...
Dataset
JSON
You can also access this registry using the
API
(see
API Docs
).
Before browse our site, please accept our
cookies policy
Accept and close this alert