35 datasets found

Formats: JSON Tags: Audio

Filter Results
  • TSP speech database

    The TSP speech database is a dataset of speech recordings.
  • Isolet dataset

    The dataset used in this paper is the Isolet dataset, which contains 4,000 13-channel audio recordings of 100 speakers.
  • PANNS

    The PANNS dataset is a large-scale audio classification dataset.
  • Freesound Dataset

    The Freesound dataset consists of 18,873 audio files, each assigned one of the 41 unique audio events from the Google's Audioset Ontology.
  • Semi Supervised Learning for Few-Shot Audio Classification by Episodic Triple...

    Few-shot learning aims to generalize unseen classes that appear during testing but are unavailable during training. The performance of prototypical networks in extreme few-shot...
  • AISHELL-1

    The AISHELL-1 dataset is a Mandarin speech corpus, consisting of 178 hours of speech, with 11 domains and 400 speakers from different accent areas in China.
  • TEDLIUM2

    The TEDLIUM2 dataset is a large corpus of audio recordings of human speech, with a focus on speech recognition tasks.
  • WSJ0-2mix

    The dataset used in the paper is the WSJ0-2mix dataset, which contains 30 hours of training data and 10 hours of validation data generated from the WSJ0 dataset. The speech...
  • Demand

    Deep Matrix Approximately Nonlinear Decomposition (DEMAND) employs Adam and an alternative optimization strategy that is well suited to optimize convex/alternative convex...
  • WavCaps

    The WavCaps dataset contains chatGPT-assisted weakly-labeled audio captioning data.
  • COVID-19 Identification ResNet (CIdeR)

    The COVID-19 Identification ResNet (CIdeR) dataset consists of 517 crowdsourced coughing and breathing audio recordings from 355 participants, of which 62 participants had tested...
  • VoiceBank DEMAND dataset

    Speech enhancement dataset
  • TIMIT dataset

    The dataset used in this paper is a collection of phonetically and phonologically local allophonic distribution in English, where voiceless stops surface as aspirated...
  • VGGSound

    The VGGSound dataset is a large-scale audio-visual dataset containing 10,000 10-second video clips with corresponding audio files.
  • TIMIT

    The TIMIT corpus is a widely used benchmark for speech recognition tasks. It contains 3,696 training utterances from 462 speakers, excluding the SA sentences. The core test set...
  • Schizophrenia Spectrum Dataset

    The dataset used for this study was collected for a mental health assessment project conducted at the University of Maryland School of Medicine in collaboration with the...
  • CHiME-4

    The CHiME-4 dataset is a large-scale speech recognition dataset, containing over 2 hours of speech from 6 channels.
  • L2-Arctic

    The dataset used for the task of mispronunciation detection for second language learners.
  • GTZAN

    The GTZAN dataset is a comprehensive collection of 1000 audio tracks, each 30 seconds long, representing ten diverse music genres.
  • LJSpeech-1.1

    The LJSpeech-1.1 dataset is a large-scale speech dataset containing approximately 24 hours of single-speaker speech recorded at 22 050 Hz.
You can also access this registry using the API (see API Docs).