-
The ICSI Meeting Corpus
The ICSI Meeting Corpus -
Proprietary dataset
Proprietary dataset consisting of 57 hours of Korean speech recorded by 38 professional voice actors. -
LibriTTS-R
The LibriTTS-R dataset, used as a reference speech dataset for the proposed TTSDS benchmark. -
TTSDS Benchmark
The dataset used for the proposed TTSDS benchmark, which includes 35 TTS systems developed between 2008 and 2024. -
Neural Codec Language Models
Neural codec language models are zero-shot text to speech synthesizers. -
Chinese Prosody Prediction Dataset
The dataset used in the paper for automatic prosody prediction for Chinese speech synthesis using BLSTM-RNN and embedding features. -
Aozorabunko dataset
Aozorabunko dataset used for pre-training of PnG BERT model. -
Wikipedia2 and Aozorabunko datasets
Wikipedia2 and Aozorabunko datasets used for pre-training of PnG BERT model. -
Diffusion Models for Minimally-Supervised Speech Synthesis
Minimally-supervised speech synthesis method based on diffusion models with minimal supervision. Introduces the CTAP method as an intermediate semantic representation and uses... -
Speech Corpus
A speech corpus of size 7,000 used for training and validation of the FCI module. -
Mega-TTS: Zero-Shot Text-to-Speech at Scale with Intrinsic Inductive Bias
Scaling text-to-speech to a large and wild dataset has been proven to be highly effective in achieving timbre and speech style generalization, particularly in zero-shot TTS.... -
Development of HMM-based Indonesian speech synthesis
Development of HMM-based Indonesian speech synthesis. -
TIMIT Corpus
The TIMIT corpus is a large database of speech recordings used for speaker recognition and speech synthesis tasks.