-
ASRU 2019 Mandarin-English code-switching speech recognition challenge
The ASRU 2019 Mandarin-English code-switching speech recognition challenge dataset. -
Wall Street Journal
The Wall Street Journal dataset is used for syntactic linearization. It contains a large corpus of news articles with their corresponding syntactic trees. -
Video Corpus
A corpus of free and representative video content was gathered. This corpus includes videos having progressive scanning, 1280x720 resolution, and framerates between 24-30 frames... -
Correction Focused Language Model Training for Speech Recognition
Language models have been commonly adopted to boost the performance of automatic speech recognition (ASR) particularly in domain adaptation tasks. Conventional way of LM... -
AudioMNIST dataset
The dataset used in the paper is the AudioMNIST dataset, which contains 30,000 audio recordings. -
Convolutional Neural Networks for Speech Recognition
The Speech Recognition dataset is used for speech recognition tasks. -
Data set B
The dataset used for performing continuous speech recognition experiments using EEG features. -
Data set A and B
The dataset used for performing isolated and continuous speech recognition experiments using EEG features. -
LibriSpeech: An ASR Corpus Based on Public Domain Audio Books
LibriSpeech: an ASR corpus based on public domain audio books. -
GTZAN dataset
The GTZAN dataset is a small but popular dataset for genre classification, containing 10 musical genres, with each genre having 100 audio snippets of 30 s length. -
Free Spoken Digit Dataset
The dataset is a collection of 8kHz audio recordings of spoken digits from 'zero' to 'nine'. -
Lwazi speech corpus
Collecting and evaluating speech recognition corpora for nine southern bantu languages -
NCHLT speech corpus
The NCHLT speech corpus of the South African languages -
INT8 Winograd Acceleration for Conv1D Equipped ASR Models Deployed on Mobile ...
The dataset used in this paper is a Conv1D equipped ASR model deployed on mobile devices. -
Attention-based beamformers for multi-channel speech recognition
The proposed 2D Conv-Attention model is compared with a traditional neural beamformer and multi-head attention based model. -
People’s Speech
The People’s Speech: A large-scale diverse English speech recognition dataset for commercial usage. -
LIBRIHEAVY: A 50,000 HOURS ASR CORPUS WITH PUNCTUATION CASING AND CONTEXT
Libriheavy is a large-scale ASR corpus consisting of 50,000 hours of read English speech derived from LibriVox. To the best of our knowledge, Libriheavy is the largest...