Diffusion Models for Minimally-Supervised Speech Synthesis

doi:doi:10.57702/rb2ur70h

Diffusion Models for Minimally-Supervised Speech Synthesis

Minimally-supervised speech synthesis method based on diffusion models with minimal supervision. Introduces the CTAP method as an intermediate semantic representation and uses mel-spectrograms as acoustic representations.

Data and Resources

Original MetadataJSON
The json representation of the dataset with its distributions based on DCAT.
Explore
- Preview
- Download

Cite this as

Chunyu Qiang, Hao Li, Yixin Tian, Yi Zhao, Ying Zhang, Longbiao Wang, Jianwu Dang (2024). Dataset: Diffusion Models for Minimally-Supervised Speech Synthesis. https://doi.org/10.57702/rb2ur70h

DOI retrieved: December 16, 2024

Additional Info

Field	Value
Created	December 16, 2024
Last update	December 16, 2024
Defined In	https://doi.org/10.48550/arXiv.2309.15512
Author	Chunyu Qiang
More Authors	Hao Li Yixin Tian Yi Zhao Ying Zhang Longbiao Wang Jianwu Dang