Text-to-Speech - Groups

Wave-Tacotron: Spectrogram-Free End-to-End Text-to-Speech Synthesis

A dataset for wave-tacotron, a wave net-based text-to-speech synthesis model.

Dataset
JSON

Adaptive Text to Speech for Spontaneous Style

A dataset for adaptive text-to-speech synthesis for spontaneous style.

Dataset
JSON

Multilingual Context-Based Pronunciation Learning for Text-to-Speech

Multilingual pronunciation learning for Text-to-Speech systems. Phonetic information and linguistic knowledge are an essential component of a Text-to-speech (TTS) front-end.

Dataset
JSON

Mega-TTS: Zero-Shot Text-to-Speech at Scale with Intrinsic Inductive Bias

Scaling text-to-speech to a large and wild dataset has been proven to be highly effective in achieving timbre and speech style generalization, particularly in zero-shot TTS....

Dataset
JSON

Parallel WaveGAN-based waveform synthesis with voicing-aware conditional disc...

This paper proposes voicing-aware conditional discriminators for Parallel WaveGAN-based waveform synthesis systems.

Dataset
JSON

DeviceTTS

A small-footprint, fast, stable network for on-device text-to-speech synthesis

Dataset
JSON

QS-TTS: A Semi-Supervised Text-to-Speech Framework

QS-TTS is a semi-supervised TTS framework based on VQ-S3RL to effectively utilize more unlabeled speech audio to improve TTS quality while reducing its requirements for...

Dataset
JSON

BOFFIN TTS: Few-shot Speaker Adaptation by Bayesian Optimization

BOFFIN TTS is a novel approach for few-shot speaker adaptation. The task is to fine-tune a pre-trained TTS model to mimic a new speaker using a small corpus of target utterances.

Dataset
JSON

NEologd

The dataset is a large vocabulary Japanese accent dictionary built using the proposed technique.

Dataset
JSON

FastSpeech: Fast, Robust and Controllable Text to Speech

Neural network based end-to-end text to speech (TTS) has signiﬁcantly improved the quality of synthesized speech. Prominent methods (e.g., Tacotron 2) usually ﬁrst generate...

Dataset
JSON

Libri-Light

The dataset used in the paper is the Libri-Light dataset, which is a subset of the LibriSpeech dataset. The authors used this dataset to pre-train their proposed dual-mode ASR...

Dataset
JSON

Guided-TTS 2

Guided-TTS 2 is a diffusion-based generative model for high-quality adaptive text-to-speech with untranscribed data.

Dataset
JSON

SNIPER Training: Single-Shot Sparse Training for Text-to-Speech

Text-to-speech (TTS) models have achieved remarkable naturalness in recent years, yet like most deep neural models, they have more parameters than necessary. Sparse TTS models...

Dataset
JSON

FastSpeech

The FastSpeech dataset is a text-to-speech dataset used to train the FastSpeech model.

Dataset
JSON

Non-Attentive Tacotron

Non-Attentive Tacotron is a neural text-to-speech model that combines a robust duration predictor with an autoregressive decoder.

Dataset
JSON

MEGA-TTS 2: BOOSTING PROMPTING MECHANISMS FOR ZERO-SHOT SPEECH SYNTHESIS

Zero-shot text-to-speech aims to synthesize voices with unseen speech prompts, which significantly reduces the data and computation requirements for voice cloning by skipping...

Dataset
JSON

Style Tokens

Global Style Tokens (GSTs) are a recently-proposed method to learn latent disentangled representations of high-dimensional data. GSTs can be used within Tacotron, a...

Dataset
JSON

Tacotron

Global Style Tokens (GSTs) are a recently-proposed method to learn latent disentangled representations of high-dimensional data. GSTs can be used within Tacotron, a...

Dataset
JSON

Global Style Tokens

Global Style Tokens (GSTs) are a recently-proposed method to learn latent disentangled representations of high-dimensional data. GSTs can be used within Tacotron, a...

Dataset
JSON

Text-Predicted Global Style Tokens

Global Style Tokens (GSTs) are a recently-proposed method to learn latent disentangled representations of high-dimensional data. GSTs can be used within Tacotron, a...

Dataset
JSON

23 datasets found