Dataset - LDM

Chinese Prosody Prediction Dataset

The dataset used in the paper for automatic prosody prediction for Chinese speech synthesis using BLSTM-RNN and embedding features.
- Dataset
- JSON
Aozorabunko dataset

Aozorabunko dataset used for pre-training of PnG BERT model.
- Dataset
- JSON
Wikipedia2 and Aozorabunko datasets

Wikipedia2 and Aozorabunko datasets used for pre-training of PnG BERT model.
- Dataset
- JSON
Speech Corpus

A speech corpus of size 7,000 used for training and validation of the FCI module.
- Dataset
- JSON
DeviceTTS

A small-footprint, fast, stable network for on-device text-to-speech synthesis
- Dataset
- JSON
TIMIT Corpus

The TIMIT corpus is a large database of speech recordings used for speaker recognition and speech synthesis tasks.
- Dataset
- JSON
Internal Dataset

The internal dataset contains 6 million real-world driving scenarios from Las Vegas (LV), Seattle (SEA), San Francisco (SF), and the campus of the Stanford Linear Accelerator...
- Dataset
- JSON
Corpus and voices for Catalan speech synthesis

Corpus and voices for Catalan speech synthesis.
- Dataset
- JSON
Voice Bank Corpus

The Voice Bank Corpus is a large regional accent speech database containing over 10 hours of speech data from 20 speakers.
- Dataset
- JSON
Generative Pre-Training for Speech

Generative models have gained more and more attention in recent years for their remarkable success in tasks that required estimating and sampling data distribution to generate...
- Dataset
- JSON
JSUT corpus

The dataset is a large vocabulary Japanese accent dictionary built using the proposed technique.
- Dataset
- JSON
HiFi-GAN

HiFi-GAN: Generative adversarial networks for efficient and high-fidelity speech synthesis
- Dataset
- JSON
ASVspoof 2019

The ASVspoof 2019 dataset is a large-scale public dataset for speaker verification and spoofing countermeasures. The dataset contains various types of audio files, including...
- Dataset
- JSON
KSS dataset and LJSpeech dataset

Korean Single Speaker Speech (KSS) dataset and LJSpeech dataset used for speech synthesis experiments.
- Dataset
- JSON
LPCNet

This dataset has no description
- Dataset
- JSON
FastSpeech

The FastSpeech dataset is a text-to-speech dataset used to train the FastSpeech model.
- Dataset
- JSON
LJSpeech-1.1

The LJSpeech-1.1 dataset is a large-scale speech dataset containing approximately 24 hours of single-speaker speech recorded at 22 050 Hz.
- Dataset
- JSON
Reverberation and Noise Contaminated Speech Datasets

Training and test datasets were generated by contaminating the clean data with reverberation and noise.
- Dataset
- JSON
Proprietary Speech Dataset

Proprietary speech dataset consisted of 184 hours of high quality US English speech spoken by 11 female and 10 male speakers.
- Dataset
- JSON
SpecGrad: Diffusion Probabilistic Model based Neural Vocoder with Adaptive No...

Proposed SpecGrad that adapts the spectral envelope of diffusion noise based on the conditioning log-mel spectrogram.
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

34 datasets found