-
Surprise Test Set
The surprise test set is used for evaluating the performance of the proposed system. -
English Test Set
The English test set is used for evaluating the performance of the proposed system. -
VCTK Corpus
The VCTK corpus is an English multi-speaker dataset, with 44 hours of audio spoken by 109 native English speakers. -
CSTR VCTK Corpus
The CSTR VCTK Corpus is a dataset of speech recordings of 109 speakers, each with 20 utterances. -
Style Tokens
Global Style Tokens (GSTs) are a recently-proposed method to learn latent disentangled representations of high-dimensional data. GSTs can be used within Tacotron, a... -
Global Style Tokens
Global Style Tokens (GSTs) are a recently-proposed method to learn latent disentangled representations of high-dimensional data. GSTs can be used within Tacotron, a... -
Text-Predicted Global Style Tokens
Global Style Tokens (GSTs) are a recently-proposed method to learn latent disentangled representations of high-dimensional data. GSTs can be used within Tacotron, a... -
LJSpeech Dataset
The LJSpeech dataset is a collection of audio recordings of a single female speaker reading aloud. -
LJ Speech Dataset
The LJ speech dataset is a dataset of speech samples recorded from a single speaker reading passages from 7 non-fiction books. -
LJSpeech and VCTK datasets
The LJSpeech dataset contains 13,100 22kHz audio clips of a female speaker. The VCTK dataset consists of 108 native English speakers with various accents.