Dataset - LDM

GTZAN dataset

The GTZAN dataset is a small but popular dataset for genre classification, containing 10 musical genres, with each genre having 100 audio snippets of 30 s length.
- Dataset
- JSON
Google Speech Command Dataset

The Google Speech Command Dataset is a dataset for keyword spotting, which is a task in speech recognition. The dataset contains 12 classes, including 10 keywords and two extra...
- Dataset
- JSON
GTZAN

The GTZAN dataset is a comprehensive collection of 1000 audio tracks, each 30 seconds long, representing ten diverse music genres.
- Dataset
- JSON
Speech Commands Dataset

The dataset used for training the keyword spotting model is the ESC: Dataset for Environmental Sound Classification, and the Speech Commands Dataset.
- Dataset
- JSON
VoxCeleb: A Large-Scale Speaker Identification Dataset

VoxCeleb: A Large-Scale Speaker Identification Dataset
- Dataset
- JSON
AudioCaps

Audio-text retrieval aims at retrieving a target audio clip or caption from a pool of candidates given a query in another modality.
- Dataset
- JSON
Clotho

Automated audio captioning is a cross-modal translation task for describing the content of audio clips with natural language sentences.
- Dataset
- JSON
Librispeech

The Librispeech dataset is a large-scale speaker-dependent speech corpus containing 1080 hours of speech, 5600 utterances, and 1000 speakers.
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

8 datasets found