Speech Synthesis - Groups

TACOTRON: TOWARDS END-TO-END SPEECH SYNTHESIS

A text-to-speech synthesis system typically consists of multiple stages, such as a text analysis frontend, an acoustic model and an audio synthesis module.
- Dataset
- JSON
Generative Pre-Training for Speech

Generative models have gained more and more attention in recent years for their remarkable success in tasks that required estimating and sampling data distribution to generate...
- Dataset
- JSON
JSUT corpus

The dataset is a large vocabulary Japanese accent dictionary built using the proposed technique.
- Dataset
- JSON
HiFi-GAN

HiFi-GAN: Generative adversarial networks for efficient and high-fidelity speech synthesis
- Dataset
- JSON
ASVspoof 2019

The ASVspoof 2019 dataset is a large-scale public dataset for speaker verification and spoofing countermeasures. The dataset contains various types of audio files, including...
- Dataset
- JSON
KSS dataset and LJSpeech dataset

Korean Single Speaker Speech (KSS) dataset and LJSpeech dataset used for speech synthesis experiments.
- Dataset
- JSON
LPCNet

This dataset has no description
- Dataset
- JSON
ParallelWaveGAN

ParallelWaveGAN is a wavegan-based vocoder that uses a parallel architecture.
- Dataset
- JSON
FastSpeech2

FastSpeech2 is a text-to-speech model that uses a wavegan-based vocoder.
- Dataset
- JSON
SNIPER Training: Single-Shot Sparse Training for Text-to-Speech

Text-to-speech (TTS) models have achieved remarkable naturalness in recent years, yet like most deep neural models, they have more parameters than necessary. Sparse TTS models...
- Dataset
- JSON
LJSpeech-1.1

The LJSpeech-1.1 dataset is a large-scale speech dataset containing approximately 24 hours of single-speaker speech recorded at 22 050 Hz.
- Dataset
- JSON
Reverberation and Noise Contaminated Speech Datasets

Training and test datasets were generated by contaminating the clean data with reverberation and noise.
- Dataset
- JSON
Proprietary Speech Dataset

Proprietary speech dataset consisted of 184 hours of high quality US English speech spoken by 11 female and 10 male speakers.
- Dataset
- JSON
SpecGrad: Diffusion Probabilistic Model based Neural Vocoder with Adaptive No...

Proposed SpecGrad that adapts the spectral envelope of diffusion noise based on the conditioning log-mel spectrogram.
- Dataset
- JSON
MelGAN

MelGAN: Generative adversarial networks for conditional waveform synthesis.
- Dataset
- JSON
WSJ

The WSJ corpus is a large vocabulary continuous speech recognition dataset. It contains 36416 sequences, representing around 80 hours of speech.
- Dataset
- JSON
Surprise Test Set

The surprise test set is used for evaluating the performance of the proposed system.
- Dataset
- JSON
English Test Set

The English test set is used for evaluating the performance of the proposed system.
- Dataset
- JSON
ZeroSpeech Challenge 2019

The ZeroSpeech Challenge 2019 dataset is used for unsupervised unit discovery and multi-scale code2spec inverter for Zerospeech Challenge 2019.
- Dataset
- JSON
Non-Attentive Tacotron

Non-Attentive Tacotron is a neural text-to-speech model that combines a robust duration predictor with an autoregressive decoder.
- Dataset
- JSON

56 datasets found