-
TACOTRON: TOWARDS END-TO-END SPEECH SYNTHESIS
A text-to-speech synthesis system typically consists of multiple stages, such as a text analysis frontend, an acoustic model and an audio synthesis module. -
Generative Pre-Training for Speech
Generative models have gained more and more attention in recent years for their remarkable success in tasks that required estimating and sampling data distribution to generate... -
JSUT corpus
The dataset is a large vocabulary Japanese accent dictionary built using the proposed technique. -
ASVspoof 2019
The ASVspoof 2019 dataset is a large-scale public dataset for speaker verification and spoofing countermeasures. The dataset contains various types of audio files, including... -
KSS dataset and LJSpeech dataset
Korean Single Speaker Speech (KSS) dataset and LJSpeech dataset used for speech synthesis experiments. -
ParallelWaveGAN
ParallelWaveGAN is a wavegan-based vocoder that uses a parallel architecture. -
FastSpeech2
FastSpeech2 is a text-to-speech model that uses a wavegan-based vocoder. -
SNIPER Training: Single-Shot Sparse Training for Text-to-Speech
Text-to-speech (TTS) models have achieved remarkable naturalness in recent years, yet like most deep neural models, they have more parameters than necessary. Sparse TTS models... -
LJSpeech-1.1
The LJSpeech-1.1 dataset is a large-scale speech dataset containing approximately 24 hours of single-speaker speech recorded at 22 050 Hz. -
Reverberation and Noise Contaminated Speech Datasets
Training and test datasets were generated by contaminating the clean data with reverberation and noise. -
Proprietary Speech Dataset
Proprietary speech dataset consisted of 184 hours of high quality US English speech spoken by 11 female and 10 male speakers. -
SpecGrad: Diffusion Probabilistic Model based Neural Vocoder with Adaptive No...
Proposed SpecGrad that adapts the spectral envelope of diffusion noise based on the conditioning log-mel spectrogram. -
Surprise Test Set
The surprise test set is used for evaluating the performance of the proposed system. -
English Test Set
The English test set is used for evaluating the performance of the proposed system. -
ZeroSpeech Challenge 2019
The ZeroSpeech Challenge 2019 dataset is used for unsupervised unit discovery and multi-scale code2spec inverter for Zerospeech Challenge 2019. -
Non-Attentive Tacotron
Non-Attentive Tacotron is a neural text-to-speech model that combines a robust duration predictor with an autoregressive decoder.