Dataset - LDM

Aozorabunko dataset

Aozorabunko dataset used for pre-training of PnG BERT model.
- Dataset
- JSON
Wikipedia2 and Aozorabunko datasets

Wikipedia2 and Aozorabunko datasets used for pre-training of PnG BERT model.
- Dataset
- JSON
DeviceTTS

A small-footprint, fast, stable network for on-device text-to-speech synthesis
- Dataset
- JSON
LRS2

The LRS2 dataset consists of 48,164 video clips from outdoor shows on BBC television. Each video is accompanied by an audio corresponding to a sentence with up to 100 characters.
- Dataset
- JSON
NEologd

The dataset is a large vocabulary Japanese accent dictionary built using the proposed technique.
- Dataset
- JSON
FastSpeech

The FastSpeech dataset is a text-to-speech dataset used to train the FastSpeech model.
- Dataset
- JSON
Style Tokens

Global Style Tokens (GSTs) are a recently-proposed method to learn latent disentangled representations of high-dimensional data. GSTs can be used within Tacotron, a...
- Dataset
- JSON
Tacotron

Global Style Tokens (GSTs) are a recently-proposed method to learn latent disentangled representations of high-dimensional data. GSTs can be used within Tacotron, a...
- Dataset
- JSON
Global Style Tokens

Global Style Tokens (GSTs) are a recently-proposed method to learn latent disentangled representations of high-dimensional data. GSTs can be used within Tacotron, a...
- Dataset
- JSON
Text-Predicted Global Style Tokens

Global Style Tokens (GSTs) are a recently-proposed method to learn latent disentangled representations of high-dimensional data. GSTs can be used within Tacotron, a...
- Dataset
- JSON
LJ Speech Dataset

The LJ speech dataset is a dataset of speech samples recorded from a single speaker reading passages from 7 non-fiction books.
- Dataset
- JSON
VCTK

Voice conversion (VC) is a technique that alters the voice of a source speaker to a target style, such as speaker identity, prosody, and emotion, while keeping the linguistic...
- Dataset
- JSON
LibriTTS

A popular text-based VC approach is to use an automatic speech recognition (ASR) model to extract phonetic posteriorgram (PPG) as content representation.
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

13 datasets found