Dataset - LDM

TIMIT Corpus

The TIMIT corpus is a large database of speech recordings used for speaker recognition and speech synthesis tasks.
- Dataset
- JSON
AISHELL-3

The Mandarin dataset comprises over 88,000 read utterances and roughly 85 hours of speech data.
- Dataset
- JSON
Buckeye Speech Corpus

The English dataset consists of approximately 300,000 words spoken by 40 speakers from Central Ohio in conversational settings with an interviewer.
- Dataset
- JSON
LibriTTS

A popular text-based VC approach is to use an automatic speech recognition (ASR) model to extract phonetic posteriorgram (PPG) as content representation.
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

4 datasets found