-
Diffusion Models for Minimally-Supervised Speech Synthesis
Minimally-supervised speech synthesis method based on diffusion models with minimal supervision. Introduces the CTAP method as an intermediate semantic representation and uses... -
FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis
FastDiff is a fast conditional diffusion model for high-quality speech synthesis. It employs a stack of time-aware location-variable convolutions with diverse receptive field...