Audio Captioning - Groups

WavCaps

The WavCaps dataset contains chatGPT-assisted weakly-labeled audio captioning data.

Dataset
JSON

Clotho v2

Automated audio captioning is a cross-modal translation task for describing the content of audio clips with natural language sentences.

Dataset
JSON

AudioCaps

Audio-text retrieval aims at retrieving a target audio clip or caption from a pool of candidates given a query in another modality.

Dataset
JSON

Clotho

Automated audio captioning is a cross-modal translation task for describing the content of audio clips with natural language sentences.

Dataset
JSON

Clotho: An audio captioning dataset

Audio captioning is a multi-modal task, focusing on using natural language for describing the contents of general audio. Most audio captioning methods are based on deep neural...

Dataset
JSON

5 datasets found

WavCaps

Clotho v2

AudioCaps

Clotho

Clotho: An audio captioning dataset