Dataset - LDM

VOiCES

The VOiCES dataset is used for testing the speaker recognition system. The dataset contains 7323 identities combined. The dataset is used for testing.
- Dataset
- JSON
WSJ0-2mix

The dataset used in the paper is the WSJ0-2mix dataset, which contains 30 hours of training data and 10 hours of validation data generated from the WSJ0 dataset. The speech...
- Dataset
- JSON
Wall Street Journal

The Wall Street Journal dataset is used for syntactic linearization. It contains a large corpus of news articles with their corresponding syntactic trees.
- Dataset
- JSON
ASVspoof2019

The ASVspoof2019 LA subset consists of three parts, training, development, and evaluation. Each partition has a disjoint set of speakers. The average duration of the utterances...
- Dataset
- JSON
TIMIT dataset

The dataset used in this paper is a collection of phonetically and phonologically local allophonic distribution in English, where voiceless stops surface as aspirated...
- Dataset
- JSON
MSP-IMPROV

The MSP-IMPROV dataset contains 6 sessions of dyadic interactions between pairs of male-female actors. 15 target sentences are used to collect the recordings. For each target...
- Dataset
- JSON
Speech Commands

The Speech Commands dataset consists of 105809 one-second audio recordings of 35 spoken words sampled at 16kHz. The raw speech commands dataset presents audio recordings as a...
- Dataset
- JSON
LJSpeech Dataset

The LJSpeech dataset is a collection of audio recordings of a single female speaker reading aloud.
- Dataset
- JSON
LJ Speech Dataset

The LJ speech dataset is a dataset of speech samples recorded from a single speaker reading passages from 7 non-fiction books.
- Dataset
- JSON
ESC-10

The ESC-10 dataset is a subset of the ESC-50 dataset with 10 classes and 400 recordings.
- Dataset
- JSON
ESC-50

The dataset used for training the CNN in cough detection is composed of various modified audio clips gathered from open-source online sources. Each of these audio files...
- Dataset
- JSON
VoxCeleb1

Speaker recognition aims to identify speaker information from input speech. A type of speaker recognition is speaker verification (SV). It determines whether the test speaker's...
- Dataset
- JSON
LibriSpeech dataset

The dataset used in the paper is the LibriSpeech dataset, which contains about 1,000 hours of English speech derived from audiobooks.
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

13 datasets found