Dataset - LDM

TSP speech database

The TSP speech database is a dataset of speech recordings.
- Dataset
- JSON
Isolet dataset

The dataset used in this paper is the Isolet dataset, which contains 4,000 13-channel audio recordings of 100 speakers.
- Dataset
- JSON
PANNS

The PANNS dataset is a large-scale audio classification dataset.
- Dataset
- JSON
Freesound Dataset

The Freesound dataset consists of 18,873 audio files, each assigned one of the 41 unique audio events from the Google's Audioset Ontology.
- Dataset
- JSON
Semi Supervised Learning for Few-Shot Audio Classification by Episodic Triple...

Few-shot learning aims to generalize unseen classes that appear during testing but are unavailable during training. The performance of prototypical networks in extreme few-shot...
- Dataset
- JSON
AISHELL-1

The AISHELL-1 dataset is a Mandarin speech corpus, consisting of 178 hours of speech, with 11 domains and 400 speakers from different accent areas in China.
- Dataset
- JSON
TEDLIUM2

The TEDLIUM2 dataset is a large corpus of audio recordings of human speech, with a focus on speech recognition tasks.
- Dataset
- JSON
WSJ0-2mix

The dataset used in the paper is the WSJ0-2mix dataset, which contains 30 hours of training data and 10 hours of validation data generated from the WSJ0 dataset. The speech...
- Dataset
- JSON
Demand

Deep Matrix Approximately Nonlinear Decomposition (DEMAND) employs Adam and an alternative optimization strategy that is well suited to optimize convex/alternative convex...
- Dataset
- JSON
WavCaps

The WavCaps dataset contains chatGPT-assisted weakly-labeled audio captioning data.
- Dataset
- JSON
COVID-19 Identiﬁcation ResNet (CIdeR)

The COVID-19 Identiﬁcation ResNet (CIdeR) dataset consists of 517 crowdsourced coughing and breathing audio recordings from 355 participants, of which 62 participants had tested...
- Dataset
- JSON
VoiceBank DEMAND dataset

Speech enhancement dataset
- Dataset
- JSON
TIMIT dataset

The dataset used in this paper is a collection of phonetically and phonologically local allophonic distribution in English, where voiceless stops surface as aspirated...
- Dataset
- JSON
VGGSound

The VGGSound dataset is a large-scale audio-visual dataset containing 10,000 10-second video clips with corresponding audio files.
- Dataset
- JSON
TIMIT

The TIMIT corpus is a widely used benchmark for speech recognition tasks. It contains 3,696 training utterances from 462 speakers, excluding the SA sentences. The core test set...
- Dataset
- JSON
Schizophrenia Spectrum Dataset

The dataset used for this study was collected for a mental health assessment project conducted at the University of Maryland School of Medicine in collaboration with the...
- Dataset
- JSON
CHiME-4

The CHiME-4 dataset is a large-scale speech recognition dataset, containing over 2 hours of speech from 6 channels.
- Dataset
- JSON
L2-Arctic

The dataset used for the task of mispronunciation detection for second language learners.
- Dataset
- JSON
GTZAN

The GTZAN dataset is a comprehensive collection of 1000 audio tracks, each 30 seconds long, representing ten diverse music genres.
- Dataset
- JSON
LJSpeech-1.1

The LJSpeech-1.1 dataset is a large-scale speech dataset containing approximately 24 hours of single-speaker speech recorded at 22 050 Hz.
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

35 datasets found