Natural Language Processing - Groups

A comprehensive study of IPA transcription issues and challenges for Bangla, a novel IPA transcription framework, a DUAL-IPA, a sentence level ipa transcripted parallel corpus...

Dataset
JSON

Corpus of Spoken Dutch

The Corpus of Spoken Dutch (CGN) is a dataset of spoken Dutch recordings.

Dataset
JSON

Language Models of Spoken Dutch

The dataset consists of subtitles of television shows provided by the Flemish public-service broadcaster VRT. The dataset is used to train language models of spoken Dutch.

Dataset
JSON

TIMIT

The TIMIT corpus is a widely used benchmark for speech recognition tasks. It contains 3,696 training utterances from 462 speakers, excluding the SA sentences. The core test set...

Dataset
JSON

Sanskrit ASR dataset

A dataset for Sanskrit ASR

Dataset
JSON

वाक् सञ्चयः (/Vāksañcayah ̣/)

A new Sanskrit speech corpus and a large-vocabulary ASR system for Sanskrit

Dataset
JSON

Masked Acoustic Unit for Mispronunciation Detection and Correction

The proposed method uses the acoustic unit (AU) as the intermediary feature for both mispronunciation detection and correction.

Dataset
JSON

English and Luganda datasets for ASR-free keyword spotting

South African English and Luganda datasets

Dataset
JSON

Feature learning for efficient ASR-free keyword spotting in low-resource lang...

ASR-free keyword spotting in low-resource languages

Dataset
JSON

Google Speech Commands Dataset

The Google Speech Commands Dataset contains 64,727 one-second-long utterance files which are recorded and labeled with one of 30 target categories.

Dataset
JSON

Temporal Convolution for Real-time Keyword Spotting on Mobile Devices

Keyword spotting (KWS) plays a critical role in enabling speech-based user interactions on smart devices. Recent developments in the field of deep learning have led to wide...

Dataset
JSON

Switchboard

Human speech data comprises a rich set of domain factors such as accent, syntactic and semantic variety, or acoustic environment.

Dataset
JSON

18 datasets found