Speech Synthesis - Groups

The ICSI Meeting Corpus

The ICSI Meeting Corpus
- Dataset
- JSON
BVCC

The BVCC dataset is a newly collected MOS dataset that contains 7106 English samples from previous Bliz-zard Challenge for TTS and Voice Conversion Challenge.
- Dataset
- JSON
Proprietary dataset

Proprietary dataset consisting of 57 hours of Korean speech recorded by 38 professional voice actors.
- Dataset
- JSON
LibriTTS-R

The LibriTTS-R dataset, used as a reference speech dataset for the proposed TTSDS benchmark.
- Dataset
- JSON
TTSDS Benchmark

The dataset used for the proposed TTSDS benchmark, which includes 35 TTS systems developed between 2008 and 2024.
- Dataset
- JSON
MIX

MIX dataset
- Dataset
- JSON
Viola

Unified codec language models for speech recognition, synthesis, and translation.
- Dataset
- JSON
Neural Codec Language Models

Neural codec language models are zero-shot text to speech synthesizers.
- Dataset
- JSON
Audiopalm

A large language model that can speak and listen.
- Dataset
- JSON
CodecFake

A comprehensive collection of contemporary codec models, resulting in the creation of the CodecFake dataset.
- Dataset
- JSON
Chinese Prosody Prediction Dataset

The dataset used in the paper for automatic prosody prediction for Chinese speech synthesis using BLSTM-RNN and embedding features.
- Dataset
- JSON
Aozorabunko dataset

Aozorabunko dataset used for pre-training of PnG BERT model.
- Dataset
- JSON
Wikipedia2 and Aozorabunko datasets

Wikipedia2 and Aozorabunko datasets used for pre-training of PnG BERT model.
- Dataset
- JSON
Diffusion Models for Minimally-Supervised Speech Synthesis

Minimally-supervised speech synthesis method based on diffusion models with minimal supervision. Introduces the CTAP method as an intermediate semantic representation and uses...
- Dataset
- JSON
Speech Corpus

A speech corpus of size 7,000 used for training and validation of the FCI module.
- Dataset
- JSON
Mega-TTS: Zero-Shot Text-to-Speech at Scale with Intrinsic Inductive Bias

Scaling text-to-speech to a large and wild dataset has been proven to be highly effective in achieving timbre and speech style generalization, particularly in zero-shot TTS....
- Dataset
- JSON
Development of HMM-based Indonesian speech synthesis

Development of HMM-based Indonesian speech synthesis.
- Dataset
- JSON
DeviceTTS

A small-footprint, fast, stable network for on-device text-to-speech synthesis
- Dataset
- JSON
TIMIT Corpus

The TIMIT corpus is a large database of speech recordings used for speaker recognition and speech synthesis tasks.
- Dataset
- JSON
TIMIT

The TIMIT corpus is a widely used benchmark for speech recognition tasks. It contains 3,696 training utterances from 462 speakers, excluding the SA sentences. The core test set...
- Dataset
- JSON

61 datasets found