Dataset - LDM

WavCaps

The WavCaps dataset contains chatGPT-assisted weakly-labeled audio captioning data.
- Dataset
- JSON
Clotho v2

Automated audio captioning is a cross-modal translation task for describing the content of audio clips with natural language sentences.
- Dataset
- JSON
AudioCaps

Audio-text retrieval aims at retrieving a target audio clip or caption from a pool of candidates given a query in another modality.
- Dataset
- JSON
Clotho

Automated audio captioning is a cross-modal translation task for describing the content of audio clips with natural language sentences.
- Dataset
- JSON
Clotho: An audio captioning dataset

Audio captioning is a multi-modal task, focusing on using natural language for describing the contents of general audio. Most audio captioning methods are based on deep neural...
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

5 datasets found