Datasets Activity Stream About Order by Relevance Name Ascending Name Descending Last Modified Go 3 datasets found Filter Results VoxCeleb: A Large-Scale Speaker Identification Dataset VoxCeleb: A Large-Scale Speaker Identification Dataset Dataset JSON AudioCaps Audio-text retrieval aims at retrieving a target audio clip or caption from a pool of candidates given a query in another modality. Dataset JSON Clotho Automated audio captioning is a cross-modal translation task for describing the content of audio clips with natural language sentences. Dataset JSON