Speech Synthesis - Groups

Internal Dataset

The internal dataset contains 6 million real-world driving scenarios from Las Vegas (LV), Seattle (SEA), San Francisco (SF), and the campus of the Stanford Linear Accelerator...

Dataset
JSON

Corpus and voices for Catalan speech synthesis

Corpus and voices for Catalan speech synthesis.

Dataset
JSON

Voice Bank Corpus

The Voice Bank Corpus is a large regional accent speech database containing over 10 hours of speech data from 20 speakers.

Dataset
JSON

Streamwise StyleMelGAN vocoder for wideband speech coding at very low bit rate

A GAN vocoder which is able to generate wideband speech wave-forms from parameters coded at 1.6 kbit/s.

Dataset
JSON

A Crowdsourced Open-Source Kazakh Speech Corpus and Initial Speech Recognitio...

The Kazakh speech corpus (KSC) contains around 332 hours of transcribed audio comprising over 153,000 utterances spoken by participants from different regions and age groups, as...

Dataset
JSON

TACOTRON: TOWARDS END-TO-END SPEECH SYNTHESIS

A text-to-speech synthesis system typically consists of multiple stages, such as a text analysis frontend, an acoustic model and an audio synthesis module.

Dataset
JSON

Generative Pre-Training for Speech

Generative models have gained more and more attention in recent years for their remarkable success in tasks that required estimating and sampling data distribution to generate...

Dataset
JSON

JSUT corpus

The dataset is a large vocabulary Japanese accent dictionary built using the proposed technique.

Dataset
JSON

HiFi-GAN

HiFi-GAN: Generative adversarial networks for efficient and high-fidelity speech synthesis

Dataset
JSON

ASVspoof 2019

The ASVspoof 2019 dataset is a large-scale public dataset for speaker verification and spoofing countermeasures. The dataset contains various types of audio files, including...

Dataset
JSON

KSS dataset and LJSpeech dataset

Korean Single Speaker Speech (KSS) dataset and LJSpeech dataset used for speech synthesis experiments.

Dataset
JSON

LPCNet

This dataset has no description

Dataset
JSON

ParallelWaveGAN

ParallelWaveGAN is a wavegan-based vocoder that uses a parallel architecture.

Dataset
JSON

FastSpeech2

FastSpeech2 is a text-to-speech model that uses a wavegan-based vocoder.

Dataset
JSON

SNIPER Training: Single-Shot Sparse Training for Text-to-Speech

Text-to-speech (TTS) models have achieved remarkable naturalness in recent years, yet like most deep neural models, they have more parameters than necessary. Sparse TTS models...

Dataset
JSON

LJSpeech-1.1

The LJSpeech-1.1 dataset is a large-scale speech dataset containing approximately 24 hours of single-speaker speech recorded at 22 050 Hz.

Dataset
JSON

Reverberation and Noise Contaminated Speech Datasets

Training and test datasets were generated by contaminating the clean data with reverberation and noise.

Dataset
JSON

Proprietary Speech Dataset

Proprietary speech dataset consisted of 184 hours of high quality US English speech spoken by 11 female and 10 male speakers.

Dataset
JSON

SpecGrad: Diffusion Probabilistic Model based Neural Vocoder with Adaptive No...

Proposed SpecGrad that adapts the spectral envelope of diffusion noise based on the conditioning log-mel spectrogram.

Dataset
JSON

MelGAN

MelGAN: Generative adversarial networks for conditional waveform synthesis.

Dataset
JSON

61 datasets found