Diffusion Models for Minimally-Supervised Speech Synthesis

Minimally-supervised speech synthesis method based on diffusion models with minimal supervision. Introduces the CTAP method as an intermediate semantic representation and uses mel-spectrograms as acoustic representations.

Data and Resources

Cite this as

Chunyu Qiang, Hao Li, Yixin Tian, Yi Zhao, Ying Zhang, Longbiao Wang, Jianwu Dang (2024). Dataset: Diffusion Models for Minimally-Supervised Speech Synthesis. https://doi.org/10.57702/rb2ur70h

DOI retrieved: December 16, 2024

Additional Info

Field Value
Created December 16, 2024
Last update December 16, 2024
Defined In https://doi.org/10.48550/arXiv.2309.15512
Author Chunyu Qiang
More Authors
Hao Li
Yixin Tian
Yi Zhao
Ying Zhang
Longbiao Wang
Jianwu Dang