41 datasets found

Tags: Audio

Filter Results
  • Google LLC dataset

    A dataset of audio recordings used for training and evaluation of the Binaural Angular Separation Network.
  • LibriVox and Freesound datasets

    A combination of LibriVox and Freesound datasets used for training and evaluation.
  • Voxceleb2

    The Voxceleb2 dataset is a large-scale speaker recognition dataset, containing 2442 hours raw speech from 6112 speakers.
  • Piano-midi.de

    Polyphonic music tasks characterized by multivariate time-series. The analysis is performed in terms of efficiency and prediction accuracy on 4 polyphonic music tasks.
  • CL4AC: A CONTRASTIVE LOSS FOR AUDIO CAPTIONING

    Automated Audio captioning (AAC) is a cross-modal translation task that aims to use natural language to describe the content of an audio clip.
  • Mozart Dataset

    The dataset used for training the model consists of 13 pieces of Mozart, 989 pieces for validation, and 11,821 pieces for testing.
  • TSP speech database

    The TSP speech database is a dataset of speech recordings.
  • Isolet dataset

    The dataset used in this paper is the Isolet dataset, which contains 4,000 13-channel audio recordings of 100 speakers.
  • PANNS

    The PANNS dataset is a large-scale audio classification dataset.
  • Freesound Dataset

    The Freesound dataset consists of 18,873 audio files, each assigned one of the 41 unique audio events from the Google's Audioset Ontology.
  • Semi Supervised Learning for Few-Shot Audio Classification by Episodic Triple...

    Few-shot learning aims to generalize unseen classes that appear during testing but are unavailable during training. The performance of prototypical networks in extreme few-shot...
  • AISHELL-1

    The AISHELL-1 dataset is a Mandarin speech corpus, consisting of 178 hours of speech, with 11 domains and 400 speakers from different accent areas in China.
  • TEDLIUM2

    The TEDLIUM2 dataset is a large corpus of audio recordings of human speech, with a focus on speech recognition tasks.
  • WSJ0-2mix

    The dataset used in the paper is the WSJ0-2mix dataset, which contains 30 hours of training data and 10 hours of validation data generated from the WSJ0 dataset. The speech...
  • Demand

    Deep Matrix Approximately Nonlinear Decomposition (DEMAND) employs Adam and an alternative optimization strategy that is well suited to optimize convex/alternative convex...
  • WavCaps

    The WavCaps dataset contains chatGPT-assisted weakly-labeled audio captioning data.
  • COVID-19 Identification ResNet (CIdeR)

    The COVID-19 Identification ResNet (CIdeR) dataset consists of 517 crowdsourced coughing and breathing audio recordings from 355 participants, of which 62 participants had tested...
  • VoiceBank DEMAND dataset

    Speech enhancement dataset
  • TIMIT dataset

    The dataset used in this paper is a collection of phonetically and phonologically local allophonic distribution in English, where voiceless stops surface as aspirated...
  • VGGSound

    The VGGSound dataset is a large-scale audio-visual dataset containing 10,000 10-second video clips with corresponding audio files.
You can also access this registry using the API (see API Docs).