Dataset - LDM

Google LLC dataset

A dataset of audio recordings used for training and evaluation of the Binaural Angular Separation Network.
- Dataset
- JSON
LibriVox and Freesound datasets

A combination of LibriVox and Freesound datasets used for training and evaluation.
- Dataset
- JSON
Voxceleb2

The Voxceleb2 dataset is a large-scale speaker recognition dataset, containing 2442 hours raw speech from 6112 speakers.
- Dataset
- JSON
Piano-midi.de

Polyphonic music tasks characterized by multivariate time-series. The analysis is performed in terms of eﬃciency and prediction accuracy on 4 polyphonic music tasks.
- Dataset
- JSON
CL4AC: A CONTRASTIVE LOSS FOR AUDIO CAPTIONING

Automated Audio captioning (AAC) is a cross-modal translation task that aims to use natural language to describe the content of an audio clip.
- Dataset
- JSON
Mozart Dataset

The dataset used for training the model consists of 13 pieces of Mozart, 989 pieces for validation, and 11,821 pieces for testing.
- Dataset
- JSON
TSP speech database

The TSP speech database is a dataset of speech recordings.
- Dataset
- JSON
Isolet dataset

The dataset used in this paper is the Isolet dataset, which contains 4,000 13-channel audio recordings of 100 speakers.
- Dataset
- JSON
PANNS

The PANNS dataset is a large-scale audio classification dataset.
- Dataset
- JSON
Freesound Dataset

The Freesound dataset consists of 18,873 audio files, each assigned one of the 41 unique audio events from the Google's Audioset Ontology.
- Dataset
- JSON
Semi Supervised Learning for Few-Shot Audio Classification by Episodic Triple...

Few-shot learning aims to generalize unseen classes that appear during testing but are unavailable during training. The performance of prototypical networks in extreme few-shot...
- Dataset
- JSON
AISHELL-1

The AISHELL-1 dataset is a Mandarin speech corpus, consisting of 178 hours of speech, with 11 domains and 400 speakers from different accent areas in China.
- Dataset
- JSON
TEDLIUM2

The TEDLIUM2 dataset is a large corpus of audio recordings of human speech, with a focus on speech recognition tasks.
- Dataset
- JSON
WSJ0-2mix

The dataset used in the paper is the WSJ0-2mix dataset, which contains 30 hours of training data and 10 hours of validation data generated from the WSJ0 dataset. The speech...
- Dataset
- JSON
Demand

Deep Matrix Approximately Nonlinear Decomposition (DEMAND) employs Adam and an alternative optimization strategy that is well suited to optimize convex/alternative convex...
- Dataset
- JSON
WavCaps

The WavCaps dataset contains chatGPT-assisted weakly-labeled audio captioning data.
- Dataset
- JSON
COVID-19 Identiﬁcation ResNet (CIdeR)

The COVID-19 Identiﬁcation ResNet (CIdeR) dataset consists of 517 crowdsourced coughing and breathing audio recordings from 355 participants, of which 62 participants had tested...
- Dataset
- JSON
VoiceBank DEMAND dataset

Speech enhancement dataset
- Dataset
- JSON
TIMIT dataset

The dataset used in this paper is a collection of phonetically and phonologically local allophonic distribution in English, where voiceless stops surface as aspirated...
- Dataset
- JSON
VGGSound

The VGGSound dataset is a large-scale audio-visual dataset containing 10,000 10-second video clips with corresponding audio files.
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

41 datasets found