-
WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis
WaveGrad 2 is a non-autoregressive generative model for text-to-speech synthesis. It is trained to estimate the gradient of the log conditional density of the waveform given a... -
ASVspoof 2021
The ASVspoof 2021 dataset is a large-scale public dataset for speaker verification and spoofing countermeasures. The dataset contains various types of audio files, including... -
BOFFIN TTS: Few-shot Speaker Adaptation by Bayesian Optimization
BOFFIN TTS is a novel approach for few-shot speaker adaptation. The task is to fine-tune a pre-trained TTS model to mimic a new speaker using a small corpus of target utterances. -
TACOTRON: TOWARDS END-TO-END SPEECH SYNTHESIS
A text-to-speech synthesis system typically consists of multiple stages, such as a text analysis frontend, an acoustic model and an audio synthesis module. -
ASVspoof 2019
The ASVspoof 2019 dataset is a large-scale public dataset for speaker verification and spoofing countermeasures. The dataset contains various types of audio files, including... -
FastSpeech2
FastSpeech2 is a text-to-speech model that uses a wavegan-based vocoder. -
SNIPER Training: Single-Shot Sparse Training for Text-to-Speech
Text-to-speech (TTS) models have achieved remarkable naturalness in recent years, yet like most deep neural models, they have more parameters than necessary. Sparse TTS models... -
MEGA-TTS 2: BOOSTING PROMPTING MECHANISMS FOR ZERO-SHOT SPEECH SYNTHESIS
Zero-shot text-to-speech aims to synthesize voices with unseen speech prompts, which significantly reduces the data and computation requirements for voice cloning by skipping... -
FastSpeech 2: Fast and High-Quality End-to-End Text-to-Speech
FastSpeech 2 is a fast and high-quality end-to-end text-to-speech system. It uses a multi-task learning approach to learn the mapping between phonemes and waveforms.