Audio Processing - Groups

Multi-clue-TSE-data

A multi-modal target sound extraction dataset based on public corpora, Audioset and AudioCaps.
- Dataset
- JSON
AudioMNIST dataset

The dataset used in the paper is the AudioMNIST dataset, which contains 30,000 audio recordings.
- Dataset
- JSON
Acoustic AVSpeech

The Acoustic AVSpeech dataset is a benchmark for visual acoustic matching.
- Dataset
- JSON
SoundSpaces-Speech

The SoundSpaces-Speech dataset is a benchmark for visual acoustic matching.
- Dataset
- JSON
AVA-Speech

The AVA-Speech dataset is a publicly available dataset of movies densely labeled with speech activity.
- Dataset
- JSON
A Hybrid CNN-BiLSTM VOICE ACTIVITY DETECTOR

A hybrid CNN-BiLSTM VOICE ACTIVITY DETECTOR for voice activity detection (VAD) incorporating both convolutional neural network (CNN) and bidirectional long short-term memory...
- Dataset
- JSON
LJSpeech Dataset

The LJSpeech dataset is a collection of audio recordings of a single female speaker reading aloud.
- Dataset
- JSON
Librispeech

The Librispeech dataset is a large-scale speaker-dependent speech corpus containing 1080 hours of speech, 5600 utterances, and 1000 speakers.
- Dataset
- JSON
LibriLight

The dataset used in this paper is a large-scale production ASR system, which includes multi-domain (MD) data sets in English. The MD data sets include medium-form (MF) and...
- Dataset
- JSON
Google Speech Commands Dataset Version II

The Google Speech Commands Dataset Version II contains 105,829 utterances of 35 words from 2,618 speakers with a sampling rate of 16 kHz.
- Dataset
- JSON

10 datasets found