-
Chinese Prosody Prediction Dataset
The dataset used in the paper for automatic prosody prediction for Chinese speech synthesis using BLSTM-RNN and embedding features. -
Aozorabunko dataset
Aozorabunko dataset used for pre-training of PnG BERT model. -
Wikipedia2 and Aozorabunko datasets
Wikipedia2 and Aozorabunko datasets used for pre-training of PnG BERT model. -
Speech Corpus
A speech corpus of size 7,000 used for training and validation of the FCI module. -
TIMIT Corpus
The TIMIT corpus is a large database of speech recordings used for speaker recognition and speech synthesis tasks. -
Internal Dataset
The internal dataset contains 6 million real-world driving scenarios from Las Vegas (LV), Seattle (SEA), San Francisco (SF), and the campus of the Stanford Linear Accelerator... -
Corpus and voices for Catalan speech synthesis
Corpus and voices for Catalan speech synthesis. -
Voice Bank Corpus
The Voice Bank Corpus is a large regional accent speech database containing over 10 hours of speech data from 20 speakers. -
Generative Pre-Training for Speech
Generative models have gained more and more attention in recent years for their remarkable success in tasks that required estimating and sampling data distribution to generate... -
JSUT corpus
The dataset is a large vocabulary Japanese accent dictionary built using the proposed technique. -
ASVspoof 2019
The ASVspoof 2019 dataset is a large-scale public dataset for speaker verification and spoofing countermeasures. The dataset contains various types of audio files, including... -
KSS dataset and LJSpeech dataset
Korean Single Speaker Speech (KSS) dataset and LJSpeech dataset used for speech synthesis experiments. -
FastSpeech
The FastSpeech dataset is a text-to-speech dataset used to train the FastSpeech model. -
LJSpeech-1.1
The LJSpeech-1.1 dataset is a large-scale speech dataset containing approximately 24 hours of single-speaker speech recorded at 22 050 Hz. -
Reverberation and Noise Contaminated Speech Datasets
Training and test datasets were generated by contaminating the clean data with reverberation and noise. -
Proprietary Speech Dataset
Proprietary speech dataset consisted of 184 hours of high quality US English speech spoken by 11 female and 10 male speakers. -
SpecGrad: Diffusion Probabilistic Model based Neural Vocoder with Adaptive No...
Proposed SpecGrad that adapts the spectral envelope of diffusion noise based on the conditioning log-mel spectrogram.