Speech Synthesis - Groups

Viola

Unified codec language models for speech recognition, synthesis, and translation.

Dataset
JSON

Neural Codec Language Models

Neural codec language models are zero-shot text to speech synthesizers.

Dataset
JSON

Audiopalm

A large language model that can speak and listen.

Dataset
JSON

CodecFake

A comprehensive collection of contemporary codec models, resulting in the creation of the CodecFake dataset.

Dataset
JSON

Diffusion Models for Minimally-Supervised Speech Synthesis

Minimally-supervised speech synthesis method based on diffusion models with minimal supervision. Introduces the CTAP method as an intermediate semantic representation and uses...

Dataset
JSON

TACOTRON: TOWARDS END-TO-END SPEECH SYNTHESIS

A text-to-speech synthesis system typically consists of multiple stages, such as a text analysis frontend, an acoustic model and an audio synthesis module.

Dataset
JSON

SNIPER Training: Single-Shot Sparse Training for Text-to-Speech

Text-to-speech (TTS) models have achieved remarkable naturalness in recent years, yet like most deep neural models, they have more parameters than necessary. Sparse TTS models...

Dataset
JSON

CSTR VCTK Corpus

The CSTR VCTK Corpus is a dataset of speech recordings of 109 speakers, each with 20 utterances.

Dataset
JSON

VCTK Dataset

The VCTK dataset is a large corpus of speech recordings, each containing a single speaker and a single sentence.

Dataset
JSON

LJSpeech Dataset

The LJSpeech dataset is a collection of audio recordings of a single female speaker reading aloud.

Dataset
JSON

FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis

FastDiff is a fast conditional diffusion model for high-quality speech synthesis. It employs a stack of time-aware location-variable convolutions with diverse receptive field...

Dataset
JSON

LJ Speech Dataset

The LJ speech dataset is a dataset of speech samples recorded from a single speaker reading passages from 7 non-fiction books.

Dataset
JSON

Hi-Fi Multi-Speaker English TTS dataset

The Hi-Fi Multi-Speaker English TTS dataset is used to generate training, validation and test inputs for the audio splicing detection and localization task.

Dataset
JSON

LibriSpeech dataset

The dataset used in the paper is the LibriSpeech dataset, which contains about 1,000 hours of English speech derived from audiobooks.

Dataset
JSON

LibriTTS

A popular text-based VC approach is to use an automatic speech recognition (ASR) model to extract phonetic posteriorgram (PPG) as content representation.

Dataset
JSON

15 datasets found