97 datasets found

Tags: Speech Recognition

Filter Results
  • AccentNet

    The accented English speech recognition challenge 2020: Open datasets, tracks, baselines, results and methods.
  • GigaSpeech

    GigaSpeech: An evolving, multi-domain ASR corpus with 10,000 hours of transcribed audio.
  • TED-LIUM 3

    TED-LIUM 3 (TL3) is a TED talks dataset. Speaker adaptation data for TL3 was divided randomly, where 2/5 was divided into the train set, 1/5 was divided into the dev set, and...
  • ASRU 2019 Mandarin-English code-switching speech recognition challenge

    The ASRU 2019 Mandarin-English code-switching speech recognition challenge dataset.
  • Wall Street Journal

    The Wall Street Journal dataset is used for syntactic linearization. It contains a large corpus of news articles with their corresponding syntactic trees.
  • Video Corpus

    A corpus of free and representative video content was gathered. This corpus includes videos having progressive scanning, 1280x720 resolution, and framerates between 24-30 frames...
  • Convolutional Neural Networks for Speech Recognition

    The Speech Recognition dataset is used for speech recognition tasks.
  • Data set B

    The dataset used for performing continuous speech recognition experiments using EEG features.
  • Data set A and B

    The dataset used for performing isolated and continuous speech recognition experiments using EEG features.
  • LibriSpeech: An ASR Corpus Based on Public Domain Audio Books

    LibriSpeech: an ASR corpus based on public domain audio books.
  • LIBRIHEAVY: A 50,000 HOURS ASR CORPUS WITH PUNCTUATION CASING AND CONTEXT

    Libriheavy is a large-scale ASR corpus consisting of 50,000 hours of read English speech derived from LibriVox. To the best of our knowledge, Libriheavy is the largest...
  • CHiME-2

    The CHiME-2 dataset is a speech separation and recognition challenge dataset. It contains 7138 utterances of 8 speakers, each with 10 seconds of speech.
  • MHINT

    The MHINT corpus is a Mandarin Chinese speech corpus used for speech recognition and speech enhancement. It contains 480 utterances of 10 speakers, each with 10 seconds of speech.
  • DeepMine

    DeepMine is a Persian speech corpus.
  • Google Speech Command

    Google Speech Command (GSC) is a dataset for limited-vocabulary speech recognition.
  • ATIS dataset

    The ATIS dataset is a benchmark dataset for spoken language understanding, consisting of audio recordings and corresponding manual transcripts about humans asking for flight...
  • UK-PODS-ALIGN

    This work showcases a cost-effective method for generating training data for speech processing tasks. The dataset UK-PODS-ALIGN is a dataset that features modern conversational...
  • MCV-10

    This work showcases a cost-effective method for generating training data for speech processing tasks. The dataset MCV-10 is a multilingual dataset that contains 50 hours of...
  • UK-PODS

    This work showcases a cost-effective method for generating training data for speech processing tasks. The dataset UK-PODS features modern conversational Ukrainian language.
  • TIMIT Corpus

    The TIMIT corpus is a large database of speech recordings used for speaker recognition and speech synthesis tasks.
You can also access this registry using the API (see API Docs).