-
End-to-End Neural Speaker Diarization with Permutation-Free Objectives
The End-to-End Neural Speaker Diarization dataset is a benchmark for speaker diarization. -
The Third DIHARD Diarization Challenge
The DIHARD dataset is a benchmark for speaker diarization. -
DNS-5 dataset
The dataset used in the paper is a benchmarking dataset for speech-to-speech translation. -
Perception of Phonological Assimilation
The dataset used in this study consists of 48 stimuli, each containing a word pair with a place assimilation, and a carrier sentence. The stimuli are designed to test the... -
Transformer based Whisper Bangla ASR model
A transformer-based Whisper Bangla ASR model -
Bengali Medical Corpus
A comprehensive 46-hour Bengali medical corpus encompassing disease names, symptoms, and symptom severity. -
Highly-Reverberant Real Environment database (HRRE)
Highly-Reverberant Real Environment database (HRRE) contains 13.4 hours of data recorded in real reverberant environments and consists of 20 different testing conditions. -
Commandersong: a systematic approach for practical adversarial voice recognition
Commandersong: a systematic approach for practical adversarial voice recognition. -
Trojan-model: a practical trojan attack against automatic speech recognition ...
Trojan-model: a practical trojan attack against automatic speech recognition systems. -
Keyword spotting in continuous speech using convolutional neural network
Keyword spotting in continuous speech using convolutional neural network. -
Speech Command Dataset (SCD)
Speech Command Dataset (SCD) is a publicly available dataset of spoken English commands categorized into 35 distinct classes. -
FlowMur: A Stealthy and Practical Audio Backdoor Attack with Limited Knowledge
Speech recognition systems driven by Deep Neural Networks (DNNs) have revolutionized human-computer interaction through voice interfaces, which significantly facilitate our... -
Hierarchical Conditional End-to-End ASR with CTC and Multi-Granular Subword U...
The proposed hierarchical conditional model of end-to-end ASR. The model is trained by gradually increasing the subword units for CTC losses applied to intermediate layers. -
IPA Transcription of Bengali Texts
A comprehensive study of IPA transcription issues and challenges for Bangla, a novel IPA transcription framework, a DUAL-IPA, a sentence level ipa transcripted parallel corpus... -
OpenSeq2Seq
The OpenSeq2Seq dataset is a speech recognition dataset used in the OpenSeq2Seq framework.