-
Correction Focused Language Model Training for Speech Recognition
Language models have been commonly adopted to boost the performance of automatic speech recognition (ASR) particularly in domain adaptation tasks. Conventional way of LM... -
InterFormer: Interactive Local and Global Features Fusion for Automatic Speec...
The local and global features are both essential for automatic speech recognition (ASR). Many recent methods have verified that simply combining local and global features can... -
INT8 Winograd Acceleration for Conv1D Equipped ASR Models Deployed on Mobile ...
The dataset used in this paper is a Conv1D equipped ASR model deployed on mobile devices. -
LIBRIHEAVY: A 50,000 HOURS ASR CORPUS WITH PUNCTUATION CASING AND CONTEXT
Libriheavy is a large-scale ASR corpus consisting of 50,000 hours of read English speech derived from LibriVox. To the best of our knowledge, Libriheavy is the largest... -
ATIS dataset
The ATIS dataset is a benchmark dataset for spoken language understanding, consisting of audio recordings and corresponding manual transcripts about humans asking for flight... -
TEDLIUM Corpus
The TEDLIUM corpus is a large-volume corpus used for speech recognition and text summarization. -
How2 Dataset
The How2 dataset consists of summarizations of How2 videos taken from YouTube. -
TED Speech Summarization Corpus
Speech summarization, which generates a text summary from speech, can be achieved by combining automatic speech recognition (ASR) and text summarization (TS). -
OpenSubtitles dataset
Open-domain neural dialogue generation (Vinyals and Le, 2015; Sordoni et al., 2015; Li et al., 2016a; Mou et al., 2016; Serban et al., 2016a; Asghar et al., 2016; Mei et al.,... -
TED2012 ASR and MT dataset
The dataset used in the paper is a collection of English ASR hypotheses from the eight submissions on the tst2012 test set in the IWSLT 2013 TED talk ASR track, along with... -
Libri-Light
The dataset used in the paper is the Libri-Light dataset, which is a subset of the LibriSpeech dataset. The authors used this dataset to pre-train their proposed dual-mode ASR... -
HKUST/MTS: A Very Large Scale Mandarin Telephone Speech Corpus
The HKUST dataset is a large dataset of speech recordings, each containing a single speaker speaking a sentence. -
The Wall Street Journal Corpus
The WSJ dataset is a large dataset of speech recordings, each containing a single speaker speaking a sentence. -
TIMIT Acoustic-Phonetic Continuous Speech Corpus
The TIMIT acoustic-phonetic continuous speech corpusCD-ROM contains a large collection of speech samples from 250 male and 250 female speakers. -
SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
SpecAugment is a data augmentation method for automatic speech recognition, which masks the mel-spectrogram along the time and frequency axes. -
MIXSPEECH: DATA AUGMENTATION FOR LOW-RESOURCE AUTOMATIC SPEECH RECOGNITION
MixSpeech is a data augmentation method for automatic speech recognition, which trains an ASR model by taking a weighted combination of two different speech features as the... -
WSJ and Switchboard datasets
The 80-hour WSJ and 300-hour Switchboard datasets are used for end-to-end speech recognition.