Dataset - LDM

MelGAN

MelGAN: Generative adversarial networks for conditional waveform synthesis.
- Dataset
- JSON
Surprise Test Set

The surprise test set is used for evaluating the performance of the proposed system.
- Dataset
- JSON
English Test Set

The English test set is used for evaluating the performance of the proposed system.
- Dataset
- JSON
VCTK Corpus

The VCTK corpus is an English multi-speaker dataset, with 44 hours of audio spoken by 109 native English speakers.
- Dataset
- JSON
CSTR VCTK Corpus

The CSTR VCTK Corpus is a dataset of speech recordings of 109 speakers, each with 20 utterances.
- Dataset
- JSON
Style Tokens

Global Style Tokens (GSTs) are a recently-proposed method to learn latent disentangled representations of high-dimensional data. GSTs can be used within Tacotron, a...
- Dataset
- JSON
Tacotron

Global Style Tokens (GSTs) are a recently-proposed method to learn latent disentangled representations of high-dimensional data. GSTs can be used within Tacotron, a...
- Dataset
- JSON
Global Style Tokens

Global Style Tokens (GSTs) are a recently-proposed method to learn latent disentangled representations of high-dimensional data. GSTs can be used within Tacotron, a...
- Dataset
- JSON
Text-Predicted Global Style Tokens

Global Style Tokens (GSTs) are a recently-proposed method to learn latent disentangled representations of high-dimensional data. GSTs can be used within Tacotron, a...
- Dataset
- JSON
LJSpeech Dataset

The LJSpeech dataset is a collection of audio recordings of a single female speaker reading aloud.
- Dataset
- JSON
LJ Speech Dataset

The LJ speech dataset is a dataset of speech samples recorded from a single speaker reading passages from 7 non-fiction books.
- Dataset
- JSON
LJSpeech and VCTK datasets

The LJSpeech dataset contains 13,100 22kHz audio clips of a female speaker. The VCTK dataset consists of 108 native English speakers with various accents.
- Dataset
- JSON
VCTK

Voice conversion (VC) is a technique that alters the voice of a source speaker to a target style, such as speaker identity, prosody, and emotion, while keeping the linguistic...
- Dataset
- JSON
LibriTTS

A popular text-based VC approach is to use an automatic speech recognition (ASR) model to extract phonetic posteriorgram (PPG) as content representation.
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

34 datasets found