34 datasets found

Tags: Speech Synthesis

Filter Results
  • MelGAN

    MelGAN: Generative adversarial networks for conditional waveform synthesis.
  • Surprise Test Set

    The surprise test set is used for evaluating the performance of the proposed system.
  • English Test Set

    The English test set is used for evaluating the performance of the proposed system.
  • VCTK Corpus

    The VCTK corpus is an English multi-speaker dataset, with 44 hours of audio spoken by 109 native English speakers.
  • CSTR VCTK Corpus

    The CSTR VCTK Corpus is a dataset of speech recordings of 109 speakers, each with 20 utterances.
  • Style Tokens

    Global Style Tokens (GSTs) are a recently-proposed method to learn latent disentangled representations of high-dimensional data. GSTs can be used within Tacotron, a...
  • Tacotron

    Global Style Tokens (GSTs) are a recently-proposed method to learn latent disentangled representations of high-dimensional data. GSTs can be used within Tacotron, a...
  • Global Style Tokens

    Global Style Tokens (GSTs) are a recently-proposed method to learn latent disentangled representations of high-dimensional data. GSTs can be used within Tacotron, a...
  • Text-Predicted Global Style Tokens

    Global Style Tokens (GSTs) are a recently-proposed method to learn latent disentangled representations of high-dimensional data. GSTs can be used within Tacotron, a...
  • LJSpeech Dataset

    The LJSpeech dataset is a collection of audio recordings of a single female speaker reading aloud.
  • LJ Speech Dataset

    The LJ speech dataset is a dataset of speech samples recorded from a single speaker reading passages from 7 non-fiction books.
  • LJSpeech and VCTK datasets

    The LJSpeech dataset contains 13,100 22kHz audio clips of a female speaker. The VCTK dataset consists of 108 native English speakers with various accents.
  • VCTK

    Voice conversion (VC) is a technique that alters the voice of a source speaker to a target style, such as speaker identity, prosody, and emotion, while keeping the linguistic...
  • LibriTTS

    A popular text-based VC approach is to use an automatic speech recognition (ASR) model to extract phonetic posteriorgram (PPG) as content representation.
You can also access this registry using the API (see API Docs).