Clotho

You're currently viewing an old version of this dataset. To see the current version, click here.

Automated audio captioning is a cross-modal translation task for describing the content of audio clips with natural language sentences.

Data and Resources

Original MetadataJSON
The json representation of the dataset with its distributions based on DCAT.
Explore
- Preview
- Download

Arvind Krishna Sridhar, Yinyi Guo, Erik Visser, Rehana Mahfuz (2024). Dataset: Clotho. https://doi.org/10.57702/c1snqbd4

DOI retrieved: November 25, 2024

Field	Value
Created	November 25, 2024
Last update	December 2, 2024
Defined In	https://doi.org/10.48550/arXiv.2307.14335
Citation	https://doi.org/10.1109/TASLP.2024.3416686 https://doi.org/10.48550/arXiv.2308.05037 https://doi.org/10.48550/arXiv.2309.03340 https://doi.org/10.48550/arXiv.2104.13553 https://doi.org/10.48550/arXiv.2307.13005
Author	Arvind Krishna Sridhar
More Authors	Yinyi Guo Erik Visser Rehana Mahfuz
Homepage	https://arxiv.org/abs/2010.03615